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I  INTRODUCTION 


The  central  theme  of  our  research  is  the  recovery  of  information 
about  the  three-dimensional  structure  and  physical  characteristics  of 
the  surfaces  depicted  in  an  image — their  shapes,  locations,  and 
photometric  properties.  The  main  obstacle  to  the  recovery  of  such 
information  is  that  images  are  inherently  ambiguous  representations  of 
the  scenes  they  depict.  They  are  two-dimensional  views  of  three- 
dimensional  space,  single  slices  in  time  of  ongoing  physical  and 
semantic  processes,  and  the  light  waves  from  which  images  are 
constructed  carry  limited  information  about  the  surfaces  from  which 
these  waves  are  reflected.  Consequently,  interpretation  cannot  be  based 
strictly  on  information  contained  in  the  image;  it  must  also  involve 
some  combination  of  a  priori  models,  constraints,  and  assumptions. 

Our  approach  has  been  to  identify  and  model  physically  meaningful 
information  that  can  be  used  to  constrain  the  interpretation  process. 
The  models  we  have  developed  fall  into  two  categories:  those  that  can  be 
used  to  reconstruct  and  label  scene  content  directly  (e.g.,  models 
relating  surface  geometry  and  the  physical  structure  of  edges  to  the 
intensity  variations  visible  in  the  image),  and  those  models  that 
identify  the  scene  context  and  the  conditions  under  which  the  image  was 
produced  (e.g.,  camera  and  illumination  models,  and  models  of  the 
semantic  content  of  the  scene).  The  second  class  of  models  is  more 
global  in  nature  and  is  used  primarily  to  constrain  and  control  the 
application  of  the  lower-level  models  in  the  first  class. 


The  research  personnel  involved  in  this  program  include 
M.  A.  Fischler,  J.  M.  Tenenbaum*,  H.  G.  Barrow,*  A.  P.  Witkin,* 
M.  R.  Lowry,  G.  Smith  and  A.  P.  Pentland. 


II  RESEARCH  ACCOMPLISHMENTS 

Much  of  our  early  work  addressed  the  issue  of  recovering  surface 
geometry  (e.g.,  orientation  and  depth)  and  reflectance  from  the  assumed 
continuity  of  the  surfaces,  knowledge  of  the  nature  of  the  edges 
bounding  the  surfaces,  and  the  intensity  variations  measured  in  the 
image.  Surface  perception  plays  a  fundamental  role  in  early  visual 
processing,  both  in  humans  and  in  machines.  An  explicit  representation 
of  surface  structure  is  necessary  for  many  low-level  visual  functions 
Involved  in  such  applications  as  terrain  modeling,  remote  sensing, 
navigation,  manipulation,  and  obstacle  avoidance.  It  is  also  a 
prerequisite  for  general-purpose  vision  systems  capable  of  human-level 
performance  in  such  tasks  as  object  recognition  and  scene  description. 
References  [l]  and  [2]  describe  our  work  on  surface  modeling. 

Our  efforts  in  the  later  stages  of  this  project  were  concerned  with 
the  issues  of  edge  modeling  (i.e.,  edge  classification),  illumination 
and  intensity  modeling  (i.e.,  the  relationship  between  surface 
reflectance  and  image  intensity),  and  modeling  of  the  geometric 


*  Now  employed  at  Fairchild  Camera  and  Instrument  Corporation,  Palo 
Alto,  California. 
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constraints  introduced  by  the  ioaging  process  (camera  model  and 
vanishing  points) . 

We  have  made  significant  progress  in  the  following  problem  areas: 
(A  description  of  this  work  is  presented  in  references  [3]  through  [6] 
and  Appendices  A  through  C.) 


(1)  The  ability  to  identify  the  physical  nature  of  an  edge 
(e.g.,  occlusion  edge,  shadow  edge,  etc.)  solely 
according  to  its  photometric  appearance  in  a  black-and- 
white  image.  This  capability  is  necessary  for  recovering 
surface  orientation  and  shape  from  a  single  image  on  the 
basis  of  shading  and  texture  variations  [Ref  3] . 

(2)  The  ability  to  quickly  locate  straight  parallel  edges 
characteristic  of  man-made  structures  (e.g.,  the  edges  of 
buildings)  [Ref  3]. 

(3)  The  ability  to  recover  absolute  scene  intensity 
information  without  calibration  data  (such  as  a  step 
wedge  exposed  on  the  image)  based  on  knowing  the  identity 
of  the  material  composition  of  the  surfaces  at  a  few 
points  in  the  image.  This  capability  is  necessary  for 
partitioning  the  image  into  labeled  regions  of  a  given 
material  type  [Ref  3] . 

(4)  The  ability  to  identify  the  skyline  in  an  image  on  the 
basis  of  simple  models  of  the  relationship  among  land, 
sky,  and  cloud  brightness  and  texture  [Appendix  C] . 


In  Appendices  A  and  B  we  provide  new  results  concerning  the 
question  of  how  to  recover  the  shape  of  a  surface  from  its  appearance  in 
an  image. 


The  relationship  between  image  intensity  and  the  orientation  of  the 
surface  responsible  for  that  image  is  usually  expressed  in  terms  of  the 
image  irradiance  equation.  This  equation  requires  that  the 
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characteristics  of  the  scene  illumination  and  the  surface  material  be 
known.  The  shape-from-shading  task  (recovering  surface  orientation  from 
image  irradiance)  has  amounted  to  solving  this  equation.  Of  the  various 
methods  that  might  be  used  to  solve  this  equation,  relaxation-style 
algorithms  seem  to  offer  the  greatest  potential  when  we  are  working  with 
natural  imagery.  To  recover  surface  orientation,  relaxation-style 
algorithms  based  on  the  image-irradiance  equation  employ  additional 
constraints.  These  constraints,  which  are  needed  to  supplement  the 
underdetermined  image-irradiance  equation,  usually  capture  the  concept 
of  smoothness.  While  smoothness  superficially  determines  the 
relationship  between  image  irradiance  and  surface  orientation,  we  have 
found  that  smoothness  is  too  weak  a  concept  to  propagate  boundary 
conditions  and  is  thus  equally  ineffectual  as  a  means  of  recovering  the 
required  solution. 

In  Appendix  A  we  derive  a  new  formulation  for  the  shape-from- 
shading  task;  we  have  traded  the  need  to  know  the  explicit  form  of  the 
scene-radiance  function  for  the  assumption  that  material  scatters  light 
isotropically.  This  model  is  applicable  to  natural  scenery  without 
additional  assumptions  about  illumination  conditions  or  the  albedo  of 
the  surface  material  (the  model  also  demonstrates  some  competence  even 
when  the  scattering  is  not  isotropic). 

In  Appendix  B  we  address  the  question:  What  constraints  on  surface 
shape  are  imposed  by  local  image  data  from  a  single  image?  We  show  that 
a  local  analysis  of  image  shading,  with  no  other  knowledge  about  the 


viewed  scene,  may  be  used  to  estimate  surface  shape.  Specifically,  we 
prove  that,  given  a  single  image,  the  following  conditions  obtain: 

(1)  Surface  orientation  may  be  determined  exactly  at 
umbilical  points  (points  with  equal  principal  curvatures) 
on  a  Lambertian  surface  through  a  local  analysis  of  the 
image  without  knowledge  of  illumination,  surface 
curvature,  albedo,  or  boundary  conditions.  Recovery  of 
surface  orientation  allows  local  determination  of 
illuminant  direction,  surface  curvature,  and  the  product 
of  illuminant  intensity  and  surface  albedo. 

(2)  The  image  intensity,  together  with  its  first  and  second 
derivatives  at  each  image  point,  are  identical  to  those 
of  an  imaged  Lambertian  umbilical  point  with  unique 
orientation,  curvature,  (overhead)  illumination  and 
product  of  albedo  times  illumination  intensity.  This 
implies  that  it  is  impossible  for  any  local  analysis  to 
determine  whether  or  not  (a)  a  surface  is  Lambertian,  and 
(b)  the  principal  curvatures  are  equal. 

When  a  person  views  a  scene,  he  has  an  appreciation  of  where  he  is 
relative  to  the  scene,  which  way  is  up,  the  general  geometric 
configuration  of  the  surfaces  (especially  the  support  and  barrier 
surfaces),  and  the  overall  semantic  context  of  the  scene.  The  research 
results  we  have  described  can  provide  similar  information  to  constrain 
the  more  detailed  interpretation  requirements  of  machine  vision  (e.g., 
such  tasks  as  stereo  compilation  and  image  matching) . 
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Local  analysis  of  image  shadit.  in  the  absence  of  prior  knowledge  about  the  viewed 
scene,  may  be  used  to  provide  information  about  the  scene.  The  following  has  been  proved. 

Every  image  point  has  the  same  image  intensity  and  first  and  second  derivatives  as 
the  image  of  an  umbilical  point  (a  point  with  equal  principal  curvatures)  on  a  Lambertian 
surface;  there  is  exactly  one  combination  of  surface  orientation,  curvature,  (overhead) 
illumination  direction  and  albedo  times  illumination  intensity  that  will  produce  a  particular 
set  of  image  intensity  and  first  and  second  derivatives.  A  solution  for  the  unique  combination 
of  surface  orientation,  etc.,  at  umbilical  points  is  presented 

This  solution  has  been  extended  by  using  general  position  and  regional  constraints  to 
obtain  estimates  of  the  following: 

•  Surface  orientation  at  each  image  point 

•  Whether  the  surface  is  planar,  singly  or  doubly  curved  at  each  point 

•  The  mean  illuminant  direction  within  a  region 

•  Whether  a  region  is  convex,  concave,  or  is  a  saddle  surface. 

Algorithms  to  recover  illuminant  direction,  identify  discontinuities,  and  estimate  sur¬ 
face  orientation  have  been  evaluated  on  both  natural  and  synthesized  images,  and  have  been 
found  to  produce  useful  information  about  the  scene. 
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LOCAL  SHADING  ANALYSIS 


I.  Introduction 

A  spatially  restricted  analysis  of  an  image  is  logically  the  first  stage  of  any  visual 
system.  This  initial  stage  of  analysis  is  especially  important  because  it  determines  what 
information  will  be  available  to  the  remainder  of  the  visual  system;  if  a  rich  description 
of  the  world  can  be  computed  locally,  there  is  a  smaller  computational  load  placed  on  the 
remainder  of  the  system.  It  is,  therefore,  important  to  acertain  as  much  about  the  world  as 
possible  at  this  first  stage  of  processing. 

Biological  visual  systems  conform  to  this  principal.  There  is  overwhelming  evidence 
that  they  devote  a  large  percentage  of  their  neurons  to  an  initial  local  analysis  of  the  image. 
Thus,  assessment  of  the  limits  and  potential  uses  of  a  local  analysis  can  be  expected  to 
provide  insight  into  both  machine  and  biological  vision  problems. 

What  information  is  available  locally?  When  we  examine  a  small  neighborhood  around 
an  image  point,  we  often  find  only  small  changes  in  shading  (changes  in  image  intensity1  ).  It 
is  unusual  to  find  a  contour  passing  through  an  image  point.  Thus,  if  we  are  to  learn  about 
scene  characteristics  from  local  examination  of  an  image,  we  must  concern  ourselves  with 
shading2  The  main  question  posed  in  this  paper  will  therefore  be:  What  information  can,  or 
cannot,  be  recovered  from  an  unfamiliar  image  through  a  local  analysis  of  shading?  In  the 
following  sections  I  shall  first  discuss  the  limitations  that  are  inherent  in  any  local  analysis 
of  image  shading,  and  then  show  how  information  about  the  scene  can  be  determined  by 
means  of  additional  constraints  derived  from  general  position  and  the  distribution  of  data 
within  homogeneous  image  regions.  Proofs  of  the  various  propositions  are  presented  in  the 
appendix. 

Previous  work.  Horn  and  his  colleagues  [2],  [3]  have  analyzed  the  process  of  image  forma¬ 
tion  and  have  developed  several  numerical  integration  schemes  for  using  image  intensity  to 
solve  for  object  shape.  These  shape-from-shading  techniques,  however,  require  considerable 
a  priori  knowledge  of  the  scene,  and  they  function  by  propagating  constraint  from  boundary 
conditions  (such  as  those  provided  by  smooth  occluding  contours)  over  the  surface  whose 
shape  is  to  be  estimated.  These  techniques,  therefore,  cannot  be  applied  to  an  unfamiliar, 
unanalyzed  scene  and  do  not  perform  the  purely  local  analysis  I  wish  to  consider  here. 

Bruss  [4]  has  addressed  the  question  of  whether  shape  can  be  derived  from  shading 
using  a  purely  local  analysis  (again  with  a  considerable  a  priori  knowledge  of  the  scene 
assumed).  She  proved  that  no  shape-from-shading  technique  can  yield  a  unique  solution 
without  additional  constraints  which,  in  certain  restricted  cases  (most  importantly  electron 

•To  avoid  confusion,  the  term  ‘image  intensity”  will  be  used  throughout  this  document,  rather  than  the 
technically  more  correct  ‘image  irradiance,”  both  for  the  flux  per  unit  area  falling  on  the  image  plane 
and  for  the  measured  image  irradiance.  The  two  may  be  assumed  to  be  numerically  equal;  and  thus  the 
distinction  has  little  significance  for  the  task  at  hand. 

*For  the  purposes  of  this  paper  we  shall  restrict  our  attention  to  shading,  because  the  problem  of  estimating 
shape  from  local  texture  information  has  already  received  much  attention,  e.g.,  (IT]  and  [18|. 


PENTLAND 


LOCAL 


OiCfri 


ANALYSIS 


Figure  1.  A  simple  model  of  Image  generation*  N  is  the  surface  normal,  L  the  illumination 
direction,  V  the  viewer’s  direction.  If  X  is  the  flux  emitted  toward  the  surface,  p  the  average  reflectance  of 
the  surface,  and  we  assume  distant  light  source  and  a  Lambertian  reflectance  function  for  the  surface,  then 
the  image  intensity  /  is  given  by  /  =*  pX(N  •  L). 

micrographs),  may  be  provided  by  the  bounding  contour  of  the  surface.  Bruss,  however, 
dealt  mostly  with  the  question  of  what  cannot  be  obtained  from  an  analysis  of  shading; 
the  question  of  what  can  be  accomplished  with  a  local  analysis  of  shading  was  not  fully 
explored.  It  is  this,  consequently,  that  we  discussed  below. 

A.  Image  Formation 

Before  we  can  make  quantitative  statements  about  the  limitations  or  usefulness  of 
a  local  analysis  of  shading,  we  must  first  develop  a  mathematical  model  of  the  image 
generation  process.  Figure  1  shows  a  simple  model  of  image  generation:  a  distant  point- 
source  illuminant  at  direction  L,  a  patch  of  surface  with  surface  normal  N,  and  a  viewer 
in  direction3  V,  We  will  assume  orthographic  projection;  note,  however,  because  the  model 
is  purely  local  orthographic  and  perspective  projection  are  identical  except  at  points  of 
discontinuity. 

The  surface  normal  N,  the  viewer’s  direction  V  and  the  illuminant  direction  L  are 
unit  vectors  in  Cartesian  three-space.  As  they  are  unit  vectors,  two  parameters  suffice  to 
specify  them,  the  third  being  determined  by  the  constraint  that  they  have  unit  magnitude. 
Two  parameters  that  are  often  chosen  are  the  slant  a  and  the  tilt  r.  The  tilt  of  a  surface 
is  the  image-plane  component  of  surface  orientation  and  is  equal  to  tan ~*(\In/xn)>  where 
xn  and  vn  are  the  x  and  y  components  of  the  surface  normal.  The  slant  of  the  surface  is 
the  depth  component  of  surface  orientation  and  is  equal  to  cos-,(r;v)>  where  zs  is  the  z 

® All  boldface  variables  (e.g.,  N,  L,  p  etc.)  represent  three-  dimensional  vectors  (x,y,  z),  all  other  variables 
are  scalars.  The  (*,  y)  plane  is  taken  parallel  to  the  image  plane,  so  that  V  “  (0,0, 1). 
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component  of  the  surface  normal. 

The  image  intensity  J  is  in  general  given  by 

I  -  p\( N  •  L)/?(N,L,  VXN  •  V)-‘ 

where  p,  the  albedo,  is  the  portion  of  incident  light  that  is  reflected,  X  is  the  amount  of  light 
incident  upon  the  surface  and  /?(N,  L,  V)  is  the  reflectance  function,  which  describes  how 
much  of  the  reflected  light  leaves  in  each  direction.  The  amount  of  incident  light  reflected  in 
the  viewer’s  direction  V  is  a  function  of  the  illuminant  direction  L  and  the  surface  normal 
N.  The  term  (N  L)  describes  the  amount  of  light  incident  upon  the  surface,  while  the  term 
(NV)-‘  describes  the  foreshortening  that  occurs  during  projection  into  the  image4  .  A 
Lambertian  reflectance  function,  an  idealization  of  rough,  matte  surfaces,  is  defined  as 

f?(N,L,  V)  =  N  •  V 

which  is  proportional  to  the  reciprocal  of  the  foreshortening  caused  by  the  projection  term. 
Thus,  for  a  Lambertian  surface  the  reflectance  function  and  the  effect  of  projection  cancel 
each  other,  and  the  equation  for  image  intensity  becomes 

J  =  ,X(NL)  (1) 

Thus,  the  assumption  of  a  Lambertian  reflectance  function  is  equivalent  to  the  assumption 
that  the  scattering  of  incident  light  is  isotropic.  We  shall  assume  a  Lambertian  reflectance 
function. 

Generality  of  the  auumptions.  The  assumption  of  a  distant  point-source  illuminant 
and  a  Lambertian  reflectance  function  is  not  as  restrictive  as  it  might  at  first  seem.  We 
note,  for  instance,  that  for  a  Lambertian  surface  any  constant  distribution  of  illumination 
is  equivalent  to  a  single  distant  point-source  illuminant;  this  follows  from  application  of  the 
mean  value  theorem.  Because  we  are  concerned  only  with  local  analysis,  the  requirement 
that  the  distribution  of  illumination  be  constant  is  almost  trivially  met5  .  Therefore,  local 
inferences  derived  with  this  single-illuminant/  Lambertian  model  will  generally  be  valid 
whenever  the  surface  scatters  incident  light  in  an  isotropic  manner,  regardless  of  the  actual 
distribution  of  illumination. 

B.  The  Derivatives  Of  Image  Intensity 

The  image  intensity  /  and  the  surface  normal  N  are  different  at  each  point  (x,y)  in 
the  image,  and  thus  are  perhaps  better  written  /(x,  y)  and  N(x,y).  However,  when  they  are 
discussed  at  a  particular  point  P,  they  will  be  written  as  simply  I  and  N.  Similarly,  we 
shall  write  dl  and  t/N  to  designate  the  first  derivative  of  image  intensity  and  the  surface 
normal,  respectively,  at  a  point  P  in  the  direction  (dx,dy).  The  partials  of  /,  N,  and  other 
variables  will  be  denoted  by  subscripts,  i.e.,  /,  =  dl/dx  and  N,  =  dN/dy. 

*In  other  terminology,  N  •  L  is  equal  to  the  cosine  of  the  incident  angle,  and  N  •  V  is  equal  to  the  cosine 
of  the  exitent  angle. 

*Only  illumination  near  the  “horizon’’  of  the  surface  patch  causes  a  problem;  in  this  case  there  is  some 
self-occlusion  and  thus  somewhat  different  illumination  at  neighboring  points. 
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If  we  are  examining  a  small,  homogeneous  region  of  an  image,  it  is  reasonable  to 
assume  that  the  illumination  and  albedo  of  the  surface  change  very  little,  and  so  we  may 
treat  L,  p,  and  X  as  constants.  If  we  also  assume  a  Lambertian  reflectance  function,  so  that 
Equation  (1)  applies,  then 

dl  =  d(p\( N  •  L))  =  p\(dS  •  L)  +  pX(N  •  dL)  =  pX(dN  •  L)  (2) 

The  term  (N  •  dL)  is  zero  because  L  was  assumed  constant.  Similarly,  the  second  derivative 
of  image  intensity  is 

<Pl  =  </(pX(rfN  •  L))  =  p\(<P N  •  L)  +  p\{dN  ■  dL)  —  /iXft^N  •  L)  (3) 

Thus,  the  second  derivative  of  image  intensity  depends  upon  the  second  derivative  of  the 
surface  normal,  just  as  the  first  derivative  depended  upon  the  first  derivative  of  the  surface 
normal. 


K.  Local  Shading  Information:  Limitations  And  Potential 

.  Before  we  can  know  what  is  possible  to  accomplish  with  local  shading  information,  it 
is  important  to  characterize  what  cannot  be  done.  The  following  proposition  describes  the 
fundamental  limitation  which  is  inherent  to  any  local  analysis  of  image  shading: 

Proposition  1.  The  image  of  a  Lambertian  umbilical  point  (a  point  with 
equal  principal  curvatures)  can  produce  any  combination  of  image  intensity  I 
and  derivatives  I x  ,  /y ,  Ixz%  fyy  and  Ix y  • 

This  proposition  says  that  when  we  view  a  point  on  a  surface,  regardless  of  what  the 
actual  surface  curvatures  are  or  what  the  actual  surface  reflect  ance  function  is,  the  resulting 
image  point  always  looks  like  an  umbilical  point  on  a  Lambertian  surface.  This  proposition 
implies,  therefore,  that  it  is  impossible  for  a  local  analysis  of  the  image  to  determine 
unambiguously  whether  a  surface  is  Lambertian  and  whether  the  principal  curvatures  are 
equal;  there  will  always  be  the  possibility  that  the  observed  point  is  an  umbilical  point  on  a 
Lambertian  surface.  We  cannot  resolve  these  ambiguities  by  resorting  to  higher  derivatives 
of  image  intensity  because,  although  more  measurements  are  obtained  by  measuring  the 
higher  derivatives,  each  additional  derivative  brings  in  more  unknowns  than  measurements. 

We  can  see  by  the  following  argument  that  this  proposition  is  likely  to  be  true. 
Consider  that  at  each  point  in  an  image  we  can  measure  the  intensity,  and  its  first  and 
second  derivatives  to  obtain  six  independent  measurements,  which  are  /,  Ix,  /„,  Ixx,  lyy 
and  lXy.  To  specify  the  image  intensity  of  an  umbilical  point  on  a  Lambertian  surface0 
requires  six  independent  parameters:  r  the  surface  tilt7  ,  o  the  surface  slant8  R  the  radius 
of  curvature,  (I1J2,  \/l  —  If  —  /|)  the  illuminant  direction,  and  pX  the  surface  albedo  times 
illuminant  intensity.  Solving  for  these  six  unknowns  requires  at  least  one  measurement  for 

®The  set  of  all  possible  images  of  umbilical  points  is  obtained  by  considering  surfaces  of  the  form  «(*,y)  «* 
\/ Ra  —  ar8  —  y®  for  particular  values  of  R  >  0,  R  >  *  >  —  R,  R  >  y  >  —R. 

7Tilt  is  the  image-plane  component  of  surface  orientation. 

'Slant  is  the  depth-component  of  surface  orientation. 
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each  unknown;  thus,  the  measurement  of  intensity,  first  and  second  derivatives  can  at  most 
establish  the  six  parameters  required  to  specify  a  Lambertian  umbilical  point.  No  additional 
measurements  are  available  to  determine  whether  the  surface  curvatures  are  unequal  or 
whether  the  reflectance  function  is  Lambertian. 

Proposition  1  leaves  open  the  possibility  that  there  may  be  a  great  many  combinations 
of  surface  orientation,  curvature,  etc.,  corresponding  to  each  combination  of  image  intensity 
and  its  derivatives.  If  the  equations  for  image  intensity  were  linear,  there  would  be  exactly 
one  combination  of  the  six  parameters  that  would  correspond  to  the  observed  measurements. 
Although  the  equations  are  not  linear,  the  following  proposition  shows  that  there  are  only 
two  possible  combinations  of  these  factors  that  will  yield  a  particular  combination  of  image 
intensity  and  derivatives. 

Proposition  2.  Given  the  image  of  an  umbilical  point  on  a  Lambertian 
surface  with  image  intensity  I  and  derivatives  fxt  /„,  Izx,  Iyy  and  Ixy  there  are 
two  possible  combinations  of  surface  orientation,  curvature,  illuminant  direction 
and  surface  albedo  times  illuminant  intensity,  one  with  the  illuminant  direction 
above  the  line  of  sight,  the  other  exactly  opposite  surface  tilt  and  illuminant 
tilt. 

Thus,  the  ambiguity  present  in  local  image  shading  is  not  much  greater  than  was 
evident  from  the  first  proposition;  there  is  only  the  additional  ambiguity  that  arises  from 
a  symmetry  involving  the  illuminant  direction  and  the  tilt  of  the  surface.  This  symmetry 
results  in  the  interrelation  of  the  direction  of  illumination  and  the  convexity  of  the  surface; 
if  the  illuminant  direction  is  taken  to  have  the  opposite  tilt  (e.g.,  from  above  the  line  of 
sight  to  below  it)  the  convexity  of  the  surface  will  reverse.  Therefore,  one  cannot  determine 
the  convexity  of  the  surface  unless  something  is  known  about  the  illuminant  direction  [1], 

[51- 

Using  propositions  1  and  2,  we  can  produce  an  exhaustive  characterization  of  the 
limitations  of  any  local  analysis  of  shading.  A  local  analysis  of  shading  cannot 

•  Determine  the  sign  and  magnitude  of  the  surface  curvatures8  e.g.,  whether  the 
surface  is  convex,  concave  or  a  saddle-shaped  and  whether  or  not  the  curvatures 
are  unequal. 

•  Determine  the  surface  reflectance  function. 

•  Separate  the  surface  albedo  from  illuminant  intensity. 

A.  Solving  For  Image  Formation  Parameters  At  An  Umbilical  Point 

The  amount  of  information  wc  can  extract  from  a  local  analysis  of  shading  (given  that 
we  are  viewing  the  image  of  a  Lambertian  umbilical  point)  is  surprising.  For  such  image 
points  we  can  solve  for  every  parameter  in  the  image  formation  process.  This  seems  to 
indicate  that  there  is  approximately  two  degrees  of  freedom  left  undetermined  by  the  local 
shading  information  when  we  view  a  point  within  a  homogeneous  region  of  an  image:  the 
ratio  of  the  surface  curvatures  and  the  degree  to  which  the  surface  is  non-Lambertian. 

The  umbilical-point  case  is  the  most  complex  situation  in  which  all  of  the  image 
formation  parameters  may  be  recovered  locally.  The  fact  that  there  are  only  relatively 

•With  the  additional  constraint  provided  by  general  position,  we  can  determine  when  the  curvatures  are 
zero.  This  is  shown  in  the  following  section. 
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Figure  S.  The  manner  In  which  Image  curvature  "spreads”  Indicates  the  tilt  of  the 
surface.  This  may  be  understood  bjr  imagining  that  we  could  observe  the  lines  of  curvature  on  a  surface 
directly.  They  would  look  just  like  the  lines  drawn  in  this  figure.  If  we  were  looking  straight  down  on  the 
surface  of  a  sphere,  the  lines  of  curvature  would  appear  perpendicular,  as  in  (a).  As  we  tilted  the  surface  to 
■  •ne  side,  the  lines  of  curvature  would  appear  progressively  more  spread,  as  in  (b)  and  (c).  Different  directions 
of  tilt  cause  spreading  in  different  directions,  as  demonstrated  in  (d).  The  amount  of  spread  depends  on  the 
slant  of  the  surface. 

few  additional  parameters  required  to  obtain  a  reasonably  general  model  suggests  that  the 
umbilical-point  solution  may  provide  us  with  a  useful  (albeit  simplified)  model  of  how  the 
various  portions  of  the  image  formation  process  evidence  themselves  in  the  image,  and  may 
also  prove  useful  as  a  tool  for  analyzing  image  points.  The  umbilical-point  solution  for 
surface  orientation,  for  example,  is  instructive  to  examine.  How  can  surface  orientation  be 
determined  from  local  shading  information? 

Imagine  that  we  could  observe  the  lines  of  curvature  on  a  surface  directly.  They  would 
look  like  the  lines  drawn  in  Figure  2.  If  we  were  looking  straight  down  on  the  surface  of  a 
sphere,  the  lines  of  curvature  would  appear  perpendicular,  as  in  Figure  2  (a).  As  we  tilted 
the  surface  to  one  side,  the  lines  of  curvature  would  appear  progressively  more  spread,  as  in 
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Figures  2  (b)  and  (c).  Different  directions  of  tilt  would  cause  spreading  in  different  directions, 
as  demonstrated  in  (d). 

We  cannot  observe  lines  of  curvature  on  the  surface  directly,  of  course,  but  we  can 
observe  the  interaction  of  surface  curvature  with  the  illuminant  in  the  second  derivatives 
of  image  intensity.  The  second  derivative  of  image  intensity  has  three  components:  Ixx  and 
Iyy,  the  “curvature”  of  image  intensity  along  the  x  and  y  axes,  respectively,  and  Ixy,  the 
“spread”  of  those  curvatures.  Just  as  with  the  spread  of  the  lines  of  curvature,  the  direction 
in  which  the  spread  term  is  greatest  is  also  the  direction  of  the  surface  tilt.  The  direction  in 
which  this  spread  is  greatest  also  turns  out  to  be  the  direction  along  which  dPl  is  greatest, 
and  hence  the  following  proposition: 

Proposition  3.  Given  the  image  of  an  umbilical  point  on  a  Lambertian 
surface,  the  tilt  of  the  surface  r  is  the  image  direction  in  which  the  second 
derivative  of  image  intensity  cPl  is  greatest. 

Thus,  for  Lambertian  umbilical  points  the  tilt  may  be  determined  from  the  second 
derivative  of  image  intensity  directly,  without  a  priori  knowledge.  This  leaves  only  the 
surface  slant  to  be  determined. 

In  Figure  2  the  direction  of  the  spread  indicated  the  tilt  of  the  surface.  Similarly, 
the  amount  of  the  spread  indicates  the  slant  (depth)  component  of  the  surface  orientation. 
Measuring  the  magnitude  of  this  spread  relative  to  the  total  curvature  (as  measured  by 
the  Laplacian)  provides  an  indicator  of  the  surface  slant,  as  described  in  the  following 
proposition. 

Proposition  4.  Given  the  image  of  an  umbilical  point  of  a  Lambertian 
surface,  the  surface  slant  a  is  given  by 


=  cos-1 


kV*I-(k*  +  l)I3 


*y 


*V2/  +  (*2 +  !)/,„ 


where  k  —  tan-1  r. 

The  slant  and  the  tilt  propositions  together  determine  surface  orientation  exactly.  Note 
that  neither  the  slant  nor  the  tilt  estimate  requires  any  knowledge  of  illuminant  direction, 
surface  albedo,  curvature,  or  illuminant  intensity. 


B.  Unlikely  Umbilical-Point  Solutions:  Constraint  From  General  Position 

Although  the  umbilical-point  solution  will  always  provide  us  with  an  interpretation 
that  is  consistent  with  the  local  image  data,  it  sometimes  yields  an  interpretation  of  an 
image  point  that  strikes  us  as  unlikely  because  the  umbilical-point  solution  requires  an 
unlikely  configuration  of  orientation,  illumination,  or  viewer  direction. 

When  we  observe  an  image  point  for  which  the  umbilical-point  solution  requires  an 
unlikely  configuration,  we  have  available  the  additional  constraint  provided  by  general 
position  to  help  us  interpret  the  image  data.  This  additional  constraint  allows  us  to  reject 
the  umbilical-point  solution  and  infer  that  something  special  has  occurred  in  the  image 
formation  process  —  something  that  may  permit  further  analysis. 

Zero  second  derivative.  One  such  case  arises  when  one  or  both  of  the  second  derivatives 
are  zero.  When  one  of  the  second  derivatives  is  zero,  the  umbilical-point  solution  is  a 
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<5^  ^ 

CONVEX !  k,»  0,k2>  0  CONCAVE>k1<0,ka<0 


Figure  S.  Surface  types.  Surfaces  may  be  classified  into  five  types:  planar,  cylindrical,  convex,  concave, 
or  saddle  surface.  The  classification  of  a  surface  depends  on  whether  the  two  principal  curvatures  *ej  and 
k 2  are  positive,  negative,  or  zero. 


surface  patch  whose  orientation  is  exactly  perpendicular  to  the  line  of  sight.  When  both 
of  the  second  derivatives  are  zero,  the  umbilical-point  solution  requires  that  the  illuminant 
direction  be  exactly  in  the  image  plane.  These  interpretations  of  the  image  point  are  unlikely 
because  precise  alignment  of  surface  orientation  or  of  illuminant  direction  is  necessary,  i.e., 
this  interpretation  of  the  image  point  presupposes  a  violation  of  general  position.  The 
more  likely  inference  when  a  zero  second  derivative  is  observed  is  that  one  of  the  surface 
curvatures  is  zero.  The  fact  that  this  inference  is  valid  (shown  in  the  appendix)  allows  us 
to  partially  classify  the  surface  type. 


Surface  points  may  be  classified  into  five  types:  planar,  cylindrical,  convex,  concave, 
or  saddle  surface.  These  five  types  are  shown  in  figure  3.  The  classification  of  a  surface 
depends  on  whether  the  two  principal  curvatures  k j  and  k2  are  positive,  negative,  or  zero: 
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plane 

Kl  =  0 

*2=0 

cylinder 

K|  ft  0 

*2  =  0 

convex 

Ki  <  0 

*2  <  0 

concave 

Ki  >  0 

*2  >  0 

saddle  surface 

Kl  >  0 

*2  <  0 

One  important  step  in  identifying  the  type  of  surface  is  determining  when  the  principal 
curvatures  are  zero,  as  this  allows  us  to  classify  the  surface  as  planar,  cylindrical,  or  doubly 
curved. 

When  one  of  the  principal  curvatures  is  zero,  the  surface  normal  does  not  change  as 
we  travel  along  the  surface  in  the  direction  of  that  principal  curvature.  Because  the  surface 
normal  N  does  not  change  along  that  direction  (let  us  specify  the  direction  by  (dz,  dy)),  we 
know  that10  dN  =  0  along  ( dx,dy ).  Since 

dl  =  p\dN  •  L  (2) 

we  see  that  dl  mjst  also  be  zero  along  ( dz,  dy ).  Unfortunately,  the  reverse  inference  is  not 
generally  (rue,  because  dl  is  zero  along  some  direction  for  every  image  point.  Therefore, 
we  cannot  infer  that  </N  =  0  in  direction  (dz,  dy)  just  because  dl  =  0  along  that  direction. 

That  problem  does  not  occur  when  we  observe  d2/  =  0  along  a  direction  (dz,  dy). 
When  the  surface  normal  does  not  change  along  a  direction  (dz,dy),  then  t^N  =  0  along 
(dz,dy).  Since 

d2/*=pXd2NL  (3) 

we  see  that,  when  =  0  along  (dz,dy),  then  d2/  must  also  be  zero  along  (dz,dy).  For  the 
second  derivative,  the  reverse  inference  —  that  d®N  =  0  because  d2/  =  0  —  is  generally 
valid,  for  when  we  observe  that  d2/  =  0  along  direction  (dz,dy),  we  can  conclude  that 
either  (1)  e^N  is  perpendicular  to  L  or  (2)  that  d2 N  ==  0.  As  it  is  unlikely  that  d2 N 
is  perpendicular  to  L  for  any  distance,  we  may  legitimately  conclude  that,  if  we  observe 
that  d2/  =  0  for  some  distance  along  direction  (dz,  dy),  then  t^N  =  0.  This  implies  that 
dN  is  constant  along  (dz,dy),  and,  if  dN  remains  constant  for  some  distance,  we  may  use 
the  constraint  of  general  position  to  conclude  that  N  is  also  constant  (this  is  shown  in  the 
appendix). 

We  can  now  begin  to  classify  the  surface.  If  we  observe  that  dPl  =  0  along  a  line  in  the 
image,  then  N  does  not  change  along  that  locus  and  we  have  a  surface  that  is  cylindrical 
along  that  line.  If  we  observe  that  d2/  =  0  along  a  direction  (dz,dy)  throughout  some 
region  in  the  image,  then  the  surface  is  a  cylinder  with  an  axis  whose  projection  points  in 
the  (dz,  dy)  direction.  Similarly,  if  we  observe  that  d2/  =  0  along  two  orthogonal  directions, 
then  N  does  not  change  along  either  direction  and  thus  the  surface  is  planar.  Finally,  if 
cPlj&O  in  all  directions,  then  the  surface  must  be  doubly  curved,  i.e.,  it  is  convex,  concave, 
or  a  saddle  surface.  We  can  not  distinguish  among  these  alternatives  on  the  basis  of  local 
shading  information  alone  —  a  consequence  of  the  previous  propositions.  Thus,  we  have 
the  following  proposition: 

,0Herc  the  ‘0’  in  ‘dN  -  0’  is  the  zero  vector. 
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Proposition  5.  The  surface  type  at  a  point  is  partially  determined  by  the 
number  of  directions  in  which  d2/  =  0. 

dPl  =  0  in  no  directions  =*  convex /concave /saddle  surface 
dPl  =  0  in  one  direction  =»  cylinder 
dPl  —  0  in  all  directions  =>  plane 

It  is  interesting  to  note  that  linear  intensity  gradients  do  not  invalidate  this  classification 
scheme. 

The  detection  of  lines  along  which  a  surface  is  cylindrical  is  of  considerable  importance 
because  it  is  only  at  such  cylindrical  lines  that  changes  occur  in  the  surface  type  (e.g.,  change 
from  a  convex  to  a  saddle  surface).  As  the  surface  changes  from  one  type  to  another,  the 
sign  of  at  least  one  of  the  principal  curvatures  changes  from  positive  to  negative,  or  vice 
versa.  In  the  course  of  a  sign  change  the  curvature  is  briefly  zero,  and  so  the  surface  is 
cylindrical  along  the  locus  where  the  surface  changes  type11  Thus,  lines  along  which  dPl  = 
0  are  places  where  the  surface  is  undergoing  a  change  of  type,  and  the  set  of  such  lines 
divides  the  surface  into  regions  that  are  of  the  same  surface  type. 


m.  Generalization  Of  The  Results:  Regional  Constraints 

In  real  images,  relatively  few  points  are  umbilical  and  relatively  few  surfaces  are 
Lambertian.  Therefore,  we  must  find  some  additional  constraints  in  order  to  obtain 
generally  applicable  formulas  for  surface  orientation,  illuminant  direction,  and  so  forth. 
Unfortunately,  the  thrust  of  the  preceding  propositions  is  that  there  is  no  point-wise  local 
assumption  that  will  generally  be  true;  there  will  always  be  at  least  a  two-parameter  family 
of  possible  solutions. 

One  way  we  can  obtain  additional  constraint  is  to  expand  our  view:  to  consider  regions 
rather  than  single  points  only.  Once  we  allow  discussion  of  regions,  we  find  that  there  are 
many  possibilities  for  obtaining  a  good  estimate  of  the  mean  value  of  particular  parameters 
within  the  region,  by  using  inferences  about  the  range  or  distribution  of  image  data  within 
the  region.  Having  obtained  an  estimate  for  the  mean  value  of  a  parameter,  we  can  then  solve 
for  other  parameters  by  assuming  the  already  estimated  value  —  i.e.,  by  bootstrapping. 

The  mean  value  of  a  parameter  within  a  region  may  be  used  either  to  comment  upon 
the  average  properties  of  the  region,  or  we  may  assume  that  the  parameter  is  constant 
throughout  the  region  and  thus  obtain  point-by-point  estimates.  If  we  comment  only  about 
average  properties,  then  the  validity  of  our  deductions  depends  solely  on  the  accuracy  of 
the  initial  estimate.  If  we  desire  point-by-point  estimates,  the  validity  of  our  inferences  is 
also  conditional  upon  the  intraregional  variance  of  the  estimated  parameter. 

In  the  remainder  of  this  paper  I  shall  discuss  results  obtained  by  estimating  the  mean 
value  of  one  parameter  within  a  region  and  then  using  this  estimate  as  an  assumption  to 
infer  other  properties  of  the  scene.  Examples  of  both  regional  and  point-by-point  inference 
will  be  presented.  Estimates  of  average  properties  of  a  region,  such  as  illumination  direc¬ 
tion  and  surface  type  (e.g.,  convex,  concave,  saddle,  etc.),  have  been  made  by  using  the 

1 1  If  the  change  takes  place  over  an  extended  area,  both  curvatures  will  be  zero  and  so  the  surface  will  be 
planar  instead  of  cylindric. 
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maximum-likelihood  estimate  of  change  in  surface  normal.  Point-by-point  estimates  of  sur¬ 
face  orientation  have  been  made  by  using  an  estimate  of  the  curvature  within  the  region. 

A.  Finding  The  Illuminant  Direction 

Estimating  the  illuminant  direction  is  difficult  because  image  data  are  determined  by 
both  the  surface  normal  and  the  illuminant  direction.  Since  evidence  relating  to  illuminant 
direction  is  confounded  with  the  unknown  direction  of  the  surface  normal,  estimating  the 
direction  of  illumination  seems  to  require  making  some  assumption  about  surface  orientation 
or  its  derivatives. 

One  useful  assumption  is  that  change  in  surface  orientation  (rfN)  is  distributed  isotropi¬ 
cally  within  each  image  region.  It  is  true  that  rfN  is  isotropically  distributed  when  considered 
over  all  scenes;  furthermore,  there  is  a  large  class  of  common  image  regions  for  which  rfN  is 
isotropically  distributed.  This  class  of  image  regions  includes  ail  images  of  convex  objects 
bounded  entirely  by  a  gradual  occluding  contour12  ,  such  as  the  image  of  a  smooth  pebble. 

Given  the  assumption  that  changes  in  surface  orientation  are  isotropically  distributed, 
we  can  devise  a  procedure  for  estimating  the  illuminant  direction  L  by  looking  for  the 
regular  biasing  effect  of  the  illuminant  direction  on  dl,  the  mean  value  of  dl,  along  various 
image  directions  (dx,  dy).  The  effect  of  the  illuminant  direction  is  to  make  dl,  vary  according 
to 

dl  =  p\dN  ■  L 

—  p\(dxNxL  +  dyNyL  +  dzNzL) 

where  rfN  =  (dxN,dyN,dzN)  is  the  mean  change  in  rfN  measured  in  image  direction 
(dx,dy),  and  L  =  (*{., !/{.,  *i.)  is  the  illuminant  direction.  Under  the  assumption  that  change 
in  surface  normal  is  distributed  isotropically  within  a  region,  then,  along  any  one  image 
direction  ( dx,dy )  we  find  that  rfz/ y  is  proportional  to  dx,  the  z-component  of  the  image 
direction,  that  dyN  is  proportional  to  dy,  the  y-component  of  the  image  direction,  and  that 
rfz/v  is  zero,  (see  [5])  Therefore, 


dl  =  k(xidx  +  y^dy)  (4) 

where  k  is  a  constant  determined  by  the  albedo,  illuminant  strength,  and  the  variance  of 
the  distribution  of  rfN  within  the  region. 

Using  (4),  we  can  set  up  a  linear  regression  that  employs  the  mean  of  dl  as  measured 
along  various  image  directions  to  obtain  a  maximum-likelihood  estimate  of  the  ratio  of  the 
unknowns  zt  and  yt-  This  ratio  is  the  tilt  of  the  illuminant  direction,  which  we  shall 
use  in  identifying  surface  type.  The  constant  k  (and  from  this  the  values  of  xl,  yi  and 
zi)  can  be  estimated  from  the  mean  and  variance  of  the  distribution  of  dl  along  various 
image  directions.  In  this  procedure,  most  of  the  information  about  the  illuminant  direction 
comes  from  image  points  where  there  are  large  changes  in  image  intensity,  e.g.,  edges  and 
specularities.  This  seems  to  agree  with  our  introspective  impression  as  to  how  we  determine 

l2This  may  be  proved  by  noting  that  the  surface  normals  on  such  an  object  are  perpendicular  to  V  at  the 
image  boundary  of  such  an  object,  and  thus  (given  that  the  object  is  strictly  convex)  we  may  form  a  1-1 
onto  map  between  the  surface  normals  of  the  object  and  the  Gaussian  sphere,  whicu  has  sum  dN  equal  to 


zero. 
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the  illuminant  direction.  Note  that  this  estimation  procedure  establishes  the  tilt  of  the 
iliuminant  direction  to  within  ±nf 2,  leaving  an  ambiguity  regarding  illuminant  position 
that  is  identical  to  the  human  perceptual  ambiguity  that  obtains  in  the  absence  of  cast 
shadow  information. 

Evaluation  of  the  illuminant  direction  estimator.  This  illuminant  estimation  pro¬ 
cedure  has  been  compared  with  the  answers  given  by  fifteen  human  observers  on  a  series  of 
digitized  pictures  of  natural  objects,  such  as  rocks  and  logs.  The  photographs  of  these  ob¬ 
jects  were  made  in  natural  illumination,  so  that  the  imaged  scenes  do  not  have  a  Lambertian 
reflectance  function  or  true  point-source  illumination.  Digitized  versions  of  the  pictures  were 
shown  to  the  human  subjects,  so  that  both  they  and  the  computer  procedure  would  receive 
exactly  the  same  image  information. 

Figures  4  (a)  and  (b)  show  a  comparison  of  human  and  computer  estimates  of  illuminant 
direction.  Previous  experiments  have  documented  that  the  fifteen  subjects’  mean  estimates 
exhibit  a  standard  error  of  ten  degrees  in  this  experimental  condition.  Thus,  the  human 
and  computer  estimates  shown  in  Figure  4  concur  to  within  experimental  error. 

Other  evidence  concerning  the  equivalence  of  human  and  computer  estimates  comes 
from  the  variance  of  the  two  estimates.  The  illuminant  direction  estimator  generates  a 
confidence  statistic  for  each  image,  along  with  its  estimate.  This  confidence  statistic  is 
proportional  to  the  variance  of  the  estimate  for  that  image  (given  the  assumptions  of  the 
procedure).  We  can  compare  the  variance  of  human  estimates  for  a  particular  picture  with 
the  variance  of  the  maximum-likelihood  estimate  (as  predicted  by  the  confidence  statistic). 
This  comparison  is  shown  in  Figure  4  (c).  There  is  a  correlation  of  0.63  between  the 
variance  of  the  two  sets  of  estimates,  significant  at  the  p  =  0.05  level.  The  linear  regression 
line  relating  the  human  and  maximum-likelihood  variance  is  shown  as  a  dashed  line;  the 
coefficients  of  the  regression  are  significantly  different  from  zero  at  the  p  —  0.01  level.  The 
significant  relationship  between  the  variance  of  the  two  estimation  procedures  (human  and 
computer)  shows  that,  when  one  of  them  finds  enough  information  in  the  image  to  make  a 
low-variance  estimate,  so  does  its  counterpart. 

One  of  the  images  employed  is  of  particular  importance,  because  it  is  an  example  of 
incorrect  estimation  by  humans  of  the  illuminant  direction.  When  the  image  of  the  rock 
shown  in  Figure  4  (d)  was  presented  to  human  subjects,  they  misestimated  the  illuminant 
direction  by  about  120  degrees  (it  is  actually  illuminated  from  top  left,  not  top  right  as  all  but 
two  of  the  fifteen  subjects  reported).  The  computer  generated  estimate,  interestingly,  agrees 
with  the  human  ones  —  even  though  in  both  cases  the  estimates  are  objectively  wrong.  This 
image  must  violate  the  assumptions  on  which  the  human  estimates  of  illuminant  direction 
are  based,  because  the  human  estimate  is  objectively  wrong.  The  special  significance  of  this 
case  is  that  it  also  violates  the  assumptions  of  the  computer  estimation  procedure  in  such 
a  manner  that  it  produces  exactly  the  same  estimate  as  the  human  subjects.  This  is  strong 
evidence  that  the  algorithm  people  employ  to  estimate  illuminant  direction  is  similar  to  the 
one  described  above. 

B.  Using  The  Illuminant  Direction  To  Type  The  Surface 

Once  we  have  an  estimate  of  L  for  a  region,  we  can  use  this  estimate  as  a  basis  for 
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Figure  4.  A  comparison  of  human  and  computer  eetlmatea  of  illuminant  direction  In 
Images  of  natural  objects.  Part  (a)  shows  the  comparison  for  the  tilt  component  of  the  illuminant 
direction,  and  (b)  shows  the  comparison  for  the  slant  component  of  the  illuminant  direction.  Part  (c)  shows 
the  relationship  between  the  variance  of  human  estimates  of  illuminant  direction  and  the  variance  of  the 
computer's  estimate  of  illuminant  direction.  There  is  a  correlation  of  0.63  between  the  variances  of  the  two 
sets  of  estimates,  significant  at  the  p  <  0.05  level.  The  dashed  line  is  the  linear  regression  line  relating  the 
variance  of  the  two  estimation  procedures., Part  (d)  is  a  picture  of  a  rock  for  which  both  human  estimates 
of  illuminant  direction  and  the  maximum  likelihood  estimate  agreed,  but  were  objectively  wrong.  Actual 
illumination  direction  is  top  right,  not  top  left  as  reported  by  all  but  two  of  the  fifteen  human  subjects. 

acquiring  further  information  about  the  image  formation  process.  One  important  use  of  L 
is  to  provide  sufficient  constraint  to  identify  the  surface  as  convex,  concave  or  saddle  —  thus 
completing  the  typing  of  the  surface. 

Figure  5  contains  an  example  of  the  “crater  illusion.”  In  this  image,  the  shadow 
information  is  not  prominent  enough  to  determine  the  illuminant  direction;  consequently, 
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no  matter  how  the  picture  is  turned  the  illumination  is  always  (by  default)  seen  to  be 
coming  from  above13  Thus,  the  direction  of  illumination  (relative  to  the  image)  changes  as 
the  image  is  turned  pside  down.  When  the  direction  of  illumination  changes,  the  convexity 
of  the  imaged  surface  also  changes  thereby  demonstrating  that  people  use  the  direction 
of  illumination  to  determine  the  convexity  of  the  surface. 

How  can  information  about  the  direction  of  illumination  be  used  to  determine  the 
surface  convexity’  When  we  invert  Figure  5,  the  sign  of  dl  as  we  move  toward  the  apparent 
illuminant  reverses,  because  the  perceived  image-plane  component  of  the  illuminant  direction 
shifts  by  n/ 2  radians.  Let  us  consider  what  the  sign  of  dl  tells  us  about  the  convexity  of 
the  imaged  surface. 

Equation  (2)  shows  that  the  change  in  image  intensity  dl  is  dependent  upon  </N,  the 
change  in  the  surface  normal: 


dl  =  d(pX(N  •  L))  =  p\(dN  L)  +  ?X(N  dL)  =  p\(dN  L)  (2) 


It  turns  out  that  rfN  is  always  perpendicular  to  N,  which  can  be  shown  by  observing  that 


2</N  N  =  </(N  N)  =  d(  1 )  =  0 

lsThis  is  an  example  of  the  above-mentioned  ±ir/2  ambiguity  in  illuminant  direction. 


Figure  6.  The  Crater  Illusion.  Pictures  of  craters  can  look  like  bumps  instead  of  depressions  if  we 
imagine  the  light  source  to  be  at  the  bottom  of  the  picture  rather  than  at  the  top;  to  see  this,  turn  the 
figure  upside  down.  This  picture  is  of  an  ash  cone  in  the  Hawaiian  Islands  (courtesy  of  W.  Richards). 
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Figure  6.  Estimation  of  surface  type,  (a)  For  a  convex  surface,  dN  measured  along  image  direction 
(dx,  dy)  typically  points  in  the  direction  {dx,  dy,  0),  so  that,  if  the  image  direction  ( dx ,  dy)  is  toward  the  light 
source,  then  dl  =  p\dN  ■  L  is  positive.  For  a  concave  surface,  dN  measured  along  image  direction  (dx,dy) 
typically  points  in  the  direction  (— dx,  —  dy,0),  so  that  dl  *=■  p\dN  ■  L  is  negative.  Thus,  the  sign  of  dl 
in  relation  to  the  illuminant  direction  gives  an  estimate  of  the  surface  convexity  along  that  direction,  (b) 
The  illuminant  direction  may  be  used  to  provide  sufficient  constraint  to  determine  the  qualitative  type  of 
surface.  Each  type  of  surface  has  a  generic  appearance,  which  may  be  characterized  by  the  angle  between 
rp,  the  direction  in  which  dl  =  0,  and  rt,  the  illuminant  direction.  The  distribution  of  To  —  tl  is  shown 
for  each  surface  type,  assuming  that  the  change  in  surface  normal  is  isotropically  distributed,  and  taking 
dl  >  0  to  the  right  of  to.  It  can  be  seen  that  the  appearance  of  the  different  types  does  not  overlap  much, 
so  that  a  good  identification  of  the  surface  type  within  the  region  may  be  made  from  this  angle. 


Whether  dl  is  positive  or  negative  along  a  particular  direction  depends  upon  whether 
dN  points  toward  or  away  from  the  illuminant  direction  L.  This  is  illustrated  in  Figure  6 
(a). 


If  we  assume  that  change  in  surface  orientation  is  isotropically  distributed  within  an 
image  region,  then,  for  a  convex  surface,  dl  measured  in  the  image  direction  ( dx,dy )  will 
typically  be  positive  if  (dx,dy)  is  toward  L.  The  sign  of  dl  is  positive  because,  for  a  convex 
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surface,  dN  measured  along  image  direction  {dx,  dy)  points  on  the  average  in  the  direction 
(dx.dy,  0),  so  that  dl  =  />XdN  •  L  is  positive.  In  contrast,  if  the  surface  is  concave  dl  will 
typically  be  negative  because,  for  a  concave  surface,  rfN  measured  along  image  direction 
(dx,dy)  points  on  the  average  in  the  direction  (—dx,—dy, 0),  so  that  dl  =  p\d N  -L  is 
negative.  Thus,  the  sign  of  dl  as  we  measure  toward  and  away  from  the  light  source  gives 
us  an  estimate  of  the  convexity  of  the  surface  in  that  direction. 

Unfortunately,  dN  measured  along  {dx,  dy)  does  not  usually  point  precisely  along  either 
the  direction  (dx,dy,  0)  or  {—dx,  —dy,  0).  Thus,  even  if  we  are  given  L  and  N,  there  remain 
too  many  unknown  factors  to  establish  the  surface  type  with  certainty.  Each  surface  type, 
however,  does  have  a  typical  or  generic  appearance.  Therefore,  given  the  tilt  of  L  and  the 
assumption  that  change  in  surface  orientation  is  distributed  isotropically  within  a  region, 
we  can  estimate  the  surface  type  by  observing  the  sign  of  dl  measured  toward  and  away 
from  the  illuminant. 

Sufficient  information  for  estimating  the  surface  type  is  provided  by  the  angle  between 
ro,  the  direction  along  which  dl  =  0,  and  ri,  the  tilt  of  the  illuminant  direction,  as  the 
sign  of  dl  is  positive  on  one  side  of  r0,  negative  on  the  other.  Thus,  knowing  r0  and 
enables  us  to  estimate  the  surface  type.  Figure  6  (B)  shows  the  probability  distribution  of 
r0  for  each  surface  type  given  r the  tilt  of  the  illuminant  direction14  ,  and  the  assumption 
that  surface  orientation  is  isotropically  distributed  within  the  region.  As  can  be  seen  by 
comparing  the  overlap  between  these  probability  distributions,  the  likelyhood  of  a  correct 
identification  is  quite  good. 

Note  that  the  ambiguity  of  ±ir/2  in  the  estimation  of  illuminant  direction  tilt  leads 
to  a  global  convexity/concavity  ambiguity.  Thus,  just  as  with  human  perception,  when  a 
scene  is  sufficiently  simple  as  to  make  L  uncertain,  the  direction  of  illumination  may  be 
“switched”  by  ir/2,  which  causes  all  the  convexity/concavity  determinations  to  change,  as 
in  Figure  5. 

C.  Estimation  Of  Surface  Orientation 

Although  in  real  images  relatively  few  points  are  umbilical  and  relatively  few  surfaces 
are  Lambertian,  the  solution  for  surface  tilt  turns  out  to  be  fairly  robust.  The  slant  equation, 
however,  depends  critically  on  equal  surface  curvatures  and  on  exact  knowledge  of  the 
surface  tilt.  We  must  look  further  to  find  an  estimator  of  surface  slant  that  will  be  generally 
serviceable. 

When  a  patch  of  surface  is  slanted  away  from  the  viewer,  projection  foreshortening 
occurs  along  the  direction  in  which  the  surface  tilts,  causing  an  apparent  increase  in  the 
surface  curvature  along  that  direction.  This  results  in  an  increase  in  image  curvature,  i.e., 
the  second  derivative  of  the  image  intensities.  Thus,  for  umbilical  points  (where  the  surface 
curvature  is  constant),  the  direction  in  which  the  second  derivative  of  image  intensity  is 
greatest  turns  out  to  be  the  tilt  of  the  surface.  The  slant  of  the  surface  can  be  measured 
by  the  amount  of  increase  in  image  curvature. 

The  fact  that  increasing  the  surface  slant  results  (all  else  being  constant)  in  increased 
image  curvature  suggests  that  a  measure  of  image  curvature  might  be  a  good  estimator  of 
,4These  distributions  were  determined  by  means  of  a  Monte  Carlo  simulation. 
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slant.  Image  curvature,  however,  also  depends  on  surface  albedo,  strength  of  illumination, 
surface  curvature,  and  other  factors.  Still,  if  we  investigate  a  homogeneous,  uniformly  lit 
region  of  a  natural  image  we  find  that  there  is  a  good  correlation  between  the  values  of  a 
measure  of  image  curvature  such  as  the  Laplacian  (V2/)  and  the  surface  slant.  If  the  surface 
albedo  or  the  illumination  changes,  however,  there  will  be  large  changes  in  the  Laplacian 
that  have  nothing  to  do  with  the  surface  slant  —  because  the  Laplacian  values  are  directly 
dependent  upon  the  surface  albedo  p  and  the  illuminantion  strength  X. 

If  we  divide  the  Laplacian  values  by  the  image  intensity  values,  we  can  remove  the 
dependence  on  p  and  X,  thus  eliminating  two  of  the  most  important  confounding  factors 
(see  Equations  (2)  and  (3)).  The  division  of  V2/  by  /  also  introduces  a  factor  that  is 
dependent  upon  the  illuminant  direction;  however,  this  dependency  does  not  seem  to  affect 
performance  seriously  —  especially  in  natural  imagery  where  there  is  a  large  amount  of 
diffuse  and  reflected  light.  Thus,  the  division  ofV2/  by  I  yields  a  measure  that  depends 
primarily  upon  the  surface  curvature  and  surface  slant.  Thus,  we  are  led  to  the  following 
estimator  of  surface  slant,  which  is  analyzed  in  the  appendix. 

Proposition  8.  Given  the  image  of  an  umbilical  point  on  a  Lambertian 
surface  and  /?,  the  radius  of  surface  curvature,  the  following  is  an  estimate  of 
zn,  the  z  component  of  the  surface  normal,  equal  to  the  arccosine  of  the  surface 
slant: 


This  estimate  of  surface  slant  turns  out  to  be  much  more  robust  than  the  umbilical- 
point  solution  for  surface  slant,  degrading  slowly  as  the  surface  curvatures  become  progres¬ 
sively  more  unequal  or  as  the  reflectance  function  becomes  non-Lambertian. 

Estimation  of  R.  To  use  this  estimator,  the  constant  R  must  be  determined.  A  good 
estimate  of  the  mean  R  within  an  image  region  can  be  made  by  applying  the  constraint  that 
the  resulting  zn  must  satisfy  the  inequality  0  >  z^  >  —  1  —  i.e.,  visible  surfaces  must  be 
facing  the  viewer.  We  can  determine  a  likely  value  for  R  by  using  this  constraint  and  the 
equation  for  zpj  in  light  of  the  range  of  values  of  V2///  within  a  region. 

We  can  then  assume  that  the  estimated  value  of  R  holds  throughout  the  region,  and 
thus  obtain  an  estimate  of  intraregional  slant.  If  the  variance  of  R  is  small  we  will  obtain  a 
good  estimate  of  surface  orientation.  It  can  happen,  however,  that  the  value  of  R  will  vary 
considerably  from  point  to  point  —  unless  we  can  place  bounds  on  the  range  of  R  so  that 
its  variance  is  reduced  to  an  acceptable  level. 

Using  the  values  of  cPl  to  identify  planar  and  near-planar  regions  (as  discussed  in  the 
previously),  we  can  place  a  bound  on  the  minimum  value  of  R.  We  can  also  place  a  bound 
on  the  largest  value  of  R  by  blurring  image  of  the  region  in  which  slant  is  to  be  estimated15 
Such  blurring  also  has  the  effect  of  removing  highlights,  specularities,  marks,  textures,  and 
the  like,  thus  making  the  imaged  surface  more  homogeneous  and  Lambertian.  By  bounding 

l6Noting  that  /  ©  G  =  *>X(N  •  L)  ®  G  =  pX((N  ®  G)  ■  L)  =•  />X(N  •  L)  where  G  is  a  two-dimensional 
Gaussian  and  0  designates  convolution,  we  see  that  a  smoothed  version  of  /  may  be  considered  the  image 
of  a  surface  with  normal  N  =  N@G  =  (N,  ®G,Nt  ®G,N,  ®G),  i.e.,  asmoothed  version  of  the  original 
surface. 
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R  the  variance  can  apparently  be  reduced  to  acceptable  levels16  ,  so  that  one  may  expect 
to  obtain  a  useful  estimate  of  surface  shape  within  a  homogeneous  area. 

Estimating  tilt  from  the  slant  estimates.  The  umbilical-point  solution  for  the  surface 
tilt  is  the  direction  in  which  Pi  attains  its  maximum.  To  calculate  the  maximum,  therefore, 
we  require  either  the  values  of  Pi  along  many  different  orientations,  or  quite  accurate  values 
of17  /*«,  /yy  and  Ixy.  This  is  a  fair  number  of  image  convolutions;  besides,  biological  visual 
systems  seem  to  manage  with  only  V2J-like  convolutions.  It  is,  therefore,  worth  inquiring 
whether  there  exists  a  method  of  determining  (Pi  along  a  particular  direction  from  the 
values  of  V2/. 

Let  us  examine  the  convolution  filters  required  to  calculate  Pi  and  V2/,  the  differential 
quantities  used  to  define  the  tilt  and  slant  estimates,  respectively.  We  can  calculate  the 
second  derivative  Pi  in  the  x  direction  by  convolving  the  image  with  pG{x,  y,  o )fdx2 ,  where 
G(x,  y,<r)  is  a  two-dimensional  Gaussian  in  the  variables  (x,  y)  with  variance  a.  Similarly, 
we  can  calculate  the  Laplacian  V2/  by  convolving  V2G(x,y,  <r)  with  the  image18  These  two 
filters  are  closely  related:  if  we  sum  PG(x,y,ff)ldx2  and  its  90°  rotation,  PG{x,  y,  a)/dy2, 
we  obtain  V2G(x,y,tr).  We  can  obtain  an  approximation  to  the  second-derivative  filter 
pG(x,y,a)/dx 2  by  using  a  weighted  sum  of  several  Laplacians  along  a  straight  line  in  the 
perpendicular  y  direction,  e.g., 

d?G(x,y,a)/dx 2  y^Gf(<,<r)V2G(x0,yo  +  i,o) 

€ 

where  G(<,<r)  is  a  one-dimensional  Gaussian,  and  G(x0,y 0,<r)  designates  a  Gaussian 
centered  about  the  point  (x0,  y0).  In  this  manner  we  can  obtain  a  close  approximation  to 
PG(x,  y,(r)/dx 2  from  V2G(x,  y)  filters  (see  (15),  [16]).  Applying  this  result  we  see  that  if  we 
were  to  sum  the  quantity  V2///  (the  input  data  for  the  slant  estimator)  along  a  straight  line, 
we  would  obtain  an  approximation  to  Plfl.  This  approximation  allows  us  to  compute  the 
direction  of  maximum  Pi  from  the  slant  estimation  data  without  additional  convolutions; 
we  need  only  find  the  orientation  along  which  the  sum  of  V2///  is  a  maximum. 

In  practice  this  approximation  to  Pi  results  in  slightly  better  performance  than  using 
the  filter  that  corresponds  exactly  to  Pi.  The  difference  arises  primarily  in  low-slant  regions 
where  the  slant  estimator  (and  thus  its  gradient  and  this  approximation)  is  more  stable  than 
the  straightforward  tilt  estimator. 

Evaluation  of  the  surface  orientation  estimate  with  an  analytic  model.  To 
ascertain  how  well  the  slant  and  tilt  estimators  might  be  expected  to  perform  under  ideal 

‘“Although  we  can  reduce  the  variance  of  R,  we  cannot  remove  systematic  bias.  Thus,  for  example,  if 
our  viewpoint  and  the  surface  shape  such  that  the  surface  curvature  varies  inversely  with  the  surface  slant 
(e.g.,  a  parabolic  solid  viewed  point-on),  we  will  obtain  a  poor  estimate  of  slant.  It  is  worth  noting  that 
people  also  perform  poorly  under  such  conditions.  Luckily,  such  arrangements  are  unusual  in  natural  scenes 
because  surface  slant  depends  on  viewpoint,  unlike  surface  curvature;  thus,  the  two  are  rarely  inversely 
related. 

,7Possession  of  these  three  values  allows  analytic  solution  for  the  direction  of  maximum  dPl\  the  solution 
is  shown  in  the  appendix. 

' “These  convolutions  may  be  regarded  as  calculating  the  exact  values  for  a  blurred  version  of  the  image; 
or,  as  mentioned  earlier,  the  blurred  image  may  be  regarded  as  an  exact  image  of  a  smoothed  version  of 
the  original  scene. 
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conditions,  a  computer  program  was  written  that  used  the  analytic  formulas  to  calculate 
the  derivatives  for  images  of  a  wide  range  of  ellipsoidal  solids.  A  Lambertian  reflectance 
function  was  assumed  and  a  wide  range  of  illumination  directions  used.  Three  sets  of  solid 
shapes  were  utilized.  The  first  set  consisted  only  of  spheres,  so  as  to  test  the  validity  of  the 
program.  The  second  set  consisted  of  ellipsoidal  solids  with  a  ratio  of  principal  curvatures 
which  ranged  from  2:1  through  1:1  to  1:2.  The  third  set  consisted  of  ellipsoidal  solids  with 
a  ratio  of  principal  curvatures  which  ranged  from  10:1  through  1:1  to  1:10.  Thus,  the  third 
set  of  solids  encompassed  shapes  ranging  from  almost  completely  cylindrical  to  spherical. 
Points  were  then  sampled  evenly  from  across  the  entire  imaged  surface  and  error  statistics 
computed. 

A  summary  of  results  for  the  tilt  estimator  over  the  three  sets  of  solids,  Kj  =  k2 
(a  sphere),  /c(/k2  =  2:1  (shapes  between  elongated  eggs  and  spheres,  i.e.,  common  non- 
cylindrical  shapes)  and  Ki/k2  —  10:1  (shapes  between  cylinders  and  spheres,  i.e.,  virtually 
all  ratios  of  curvatures)  is  shown  in  Table  1.  The  direction  of  surface  tilt  was  computed  by 
using  the  approximation  to  the  direction  of  maximum  (Pi  discussed  in  the  appendix.  All 
error  figures  are  given  in  radians. 

Table  1.  Tilt  estimator  over  all  surface  slants 


Ratio  Of  Curvatures 

Error:  Bias,  Variance 

Correlation 

K  i  =  Ko 

0.00,  0.052 

0.891 

k,/k2  =  2:1 

0.00,  0.097 

0.786 

ki/k2  —  10:1 

0.00,  0.114 

0.742 

Although  the  tilt  estimator  of  Proposition  3  performs  perfectly  on  spheres,  the  ap¬ 
proximation  used  here  shows  some  small  errors.  This  loss  of  accuracy  is  offset  by  the  greater 
stability  that  the  approximation  exhibits  in  low-slant  regions.  This  table  shows  that  as 
the  range  of  curvatures  increases  the  performance  of  the  estimator  degrades  considerably. 
However,  it  is  only  in  high-slant  regions  that  errors  in  the  tilt  cause  serious  miscalculations  in 
determining  the  surface  shape;  therefore,  if  the  tilt  estimator  performs  well  in  these  regions 
the  resulting  shape  estimate  will  still  be  accurate.  Table  2  summarzes  the  tilt  estimator’s 
performance  in  the  critical  high-slant  regions. 

Table  2.  Tilt  estimator  over  surface  slants  greater  than  30° 


Ratio  Of  Curvatures  Error:  Bias,  Variance  Correlation 

Kj  =  k2 

0.00,  0.020 

0.950 

ki/k2  =  2:1 

0.00,  0.066 

0.835 

Ki/*2  =  10:1 

0.00,  0.070 

0.816 

Table  2  describes  the  performance  of  the  tilt  estimator  when  the  surface  slant  is  greater 
than  30°.  It  can  be  seen  that  the  tilt  estimates  remain  quite  reasonable  for  both  the  2:1 
and  the  10:1  range  of  curvatures.  Thus,  the  tilt  estimator  makes  most  of  its  errors  in 
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low-slant  regions,  where  such  errors  are  relatively  unimportant.  Furthermore,  if  the  slant 
estimator  provides  consistent  estimates,  we  should  be  able  to  use  it  to  distinguish  between 
mo*f  reliable  and  less  reliable  tilt  estimates. 

Table  3.  The  slant  estimator 


Ratio  Of  Curvatures  Error:  Bias,  Variance  Correlation 

*1  =  *2 

0.201,  0.012 

0.924 

*i/k2  =  2:1 

0.182,  0.034 

0.843 

ki/ko  —  10:1 

0.137,  0.101 

0.674 

Table  3  shows  that  the  slant  estimator,  although  biased,  performs  quite  well  on  spheres. 
As  the  range  of  curvatures  increases,  the  performance  of  the  estimator  degrades  —  but,  even 
for  the  10:1  range  of  curvatures,  it  is  still  good.  For  all  these  cases  one  estimate  of  R  was 
used;  therefore,  in  all  except  the  case  of  spheres  the  estimated  R  is  actually  in  error  — 
in  some  instances  by  a  factor  of  10.  This  seems  to  indicate  that  the  slant  estimator  is 
remarkably  robust  . 


Table  4.  The  unbiased  slant  estimator 


Ratio  Of  Curvatures  Error:  Bias,  Variance  Correlation 

K|  =  Ko 

0.00,  0.004 

0.967 

kx/k2  =  2:1 

0.00,  0.006 

0.948 

ici/ko  =  10:1 

0.00,  0.028 

0.796 

The  bias  of  the  slant  estimator  that  appears  in  Table  3  also  shows  up  in  the  equations 
discussed  in  the  appendix;  where  it  is  also  explained  how  the  bias  may  be  removed.  When 
the  slant  estimator  is  made  unbiased,  its  accuracy  becomes  even  better,  as  shown  in  Table 
4.  It  turns  out  that  the  performance  of  the  slant  estimator  is  approximately  as  good  for 
regions  of  low  slant  as  for  those  of  higher  slant.  Therefore,  the  slant  estimate  produced  by 
this  estimator  is  useful  in  assessing  the  tilt  estimator’s  reliability. 

Evaluation  on  natural  images.  The  surface  orientation  estimator  (the  “shape  algo¬ 
rithm”)  has  been  tested  on  several  natural  images,  and  four  such  examples  will  be  presented 
here.  The  shape  algorithm  produces  estimates  of  the  surface  orientation;  it  was  found, 
however,  that  displays  of  the  estimated  surface  orientation  do  not  allow  an  observer  to 
evaluate  the  performance  of  the  algorithm  adequately.  Therefore,  for  purposes  of  exhibit¬ 
ing  the  performance  of  the  algorithm,  the  shape  algorithm’s  estimates  of  surface  orientation 
were  integrated  to  produce  a  relief  map  of  the  surface.  As  these  relief  maps  were  found  to 
give  observers  an  adequate  impression  of  the  estimated  surface  shape,  they  constitute  the 
output  shown  for  the  examples  presented  in  this  paper  even  though  integration  is  not  part 
of  the  shape  algorithm  per  ee. 

Figure  7  (a)  shows  the  image  of  a  log,  together  with  the  relief  map  generated  from 
the  shape  algorithm’s  estimates  of  surface  orientation.  Figure  7  (b)  shows  the  image  of  a 

21 


PENTLAND 


LOCAL  SHADING  ANALYSIS 


Figure  7.  Evaluation  on  two  natural  images,  (a)  An  image  of  a  log  and  the  relief  map  generated 
by  the  shape  algorithm  for  that  image,  (b)  An  image  of  a  rock  and  the  relief  map  generated  by  the  shape 
algorithm  for  that  image. 


rock,  together  with  the  relief  map  generated  from  the  shape  algorithm’s  estimates  of  surface 
orientation.  The  relief  maps  in  Figure  7  (a)  and  7  (b)  correspond  closely  to  the  actual  shapes 
of  these  two  objects.  The  reader  should  compare  his  impression  of  shape  from  the  images 
with  the  relief  maps  of  Figure  7  (a)  and  7  (b). 

Figure  8  shows  (a)  the  digitized  picture  of  a  small  portion  of  a  face  (belonging  to  a 
woman  named  Lisa),  and  (b)  a  relief  map  of  the  surface  slant  estimated  for  that  image  (eye 
and  eyebrow  regions  were  masked  out  by  hand).  No  relief  map  of  the  estimated  surface 
shape  is  shown  because  the  complexity  of  the  shape  made  it  difficult  to  integrate  the  slant 
and  tilt  estimates.  In  this  slant-map  representation  regions  with  higher  relief  face  toward 
the  viewer,  while  lower  relief  regions  face  away  from  him.  Note  that  many  important  details 
of  the  surface  shape  are  apparent  in  this  representation;  for  instance,  the  structure  of  the 
nose,  the  cheeks  and  the  eyebrow  ridges  is  plainly  visible. 

Figure  9  shows  (a)  the  digitized  image  of  Tuckerman’s  ravine  (a  skiing  region  on  Mt. 
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Figure  8.  The  LUe  Image,  (a)  The  digitized  image  of  Lisa  and  (b)  a  relief  map  showing  the  estimated 
surface  slant  in  this  image.  High-relief  areas  in  this  representation  correspond  to  regions  facing  the  viewer, 
low-relief  regions  to  regions  slanting  away  from  the  viewer.  Eye  regions  and  eyebrow  regions  were  masked 
out  by  hand.  The  integration  of  estimated  slant  and  estimated  tilt  to  show  surface  shape  proved  difficult 
because  of  the  complexity  of  the  surface.  Note  that  many  important  details  of  the  surface  shape  are  apparent 
in  t'  is  representation;  for  instance,  the  structure  of  the  nose,  cheeks  and  eyebrow  ridges  is  plainly  visible. 


Washington,  in  New  Hampshire),  (b)  a  relief  map  showing  a  side  view  of  the  estimated 
surface  shape,  obtained  by  integrating  the  slant  and  tilt  estimates.  This  relief  map  may 
be  compared  directly  with  a  topographic  map  of  the  area;  when  we  compare  the  estimated 
and  actual  shape,  we  find  that  the  roll-off  at  the  top  of  Figure  9  (b)  and  the  steepness  of 
the  estimated  surface  are  correct  for  this  surface19  ;  this  area  of  the  ravine  has  a  slope  that 
averages  60°. 

The  comparison  with  a  topographic  map  also  shows  that  the  relief  of  the  lower  right- 
hand  portion  of  the  image  is  somewhat  underestimated.  When  people  are  asked  to  look  at 
this  image,  however,  it  becomes  clear  that  they  also  fail  to  perceive  the  shape  of  the  ravine 
19And  people  ski  down  this  incline! 


Figure  0.  Tuekerman’e  Ravine,  (a)  The  digitised  image  of  Tuckerman's  ravine  and  (b)  a  relief  map 
showing  a  side  view  of  the  surface  estimated  for  this  image.  Comparing  a  topographic  map  of  the  area  with 
the  estimated  surface  shape,  we  find  that  the  roll-of!  at  the  top  of  (b)  and  the  steepness  of  the  estimated 
surface  are  correct  for  this  surface.  However,  the  relief  of  the  lower  right-hand  portion  of  the  image  is 
somewhat  underestimated.  The  underestimation  of  relief  is  similar  to  human  perception  of  this  image. 

correctly:  they  also  underestimate  the  relief  of  the  lower  right  portion  of  this  image20  . 

Evaluation  On  An  Electron  Microscope  image.  In  addition  to  natural  images, 
the  electron  microscope  (EM)  image  shown  in  Figure  10  (a)  was  selected  from  the  book 
Magnifications  by  D.  Scharf  [6j.  People  can  use  the  shading  information  in  EM  images 
to  perceive  shape,  as  Figure  10  (a)  confirms.  This  is  surprising  because  these  images  have 
a  reflectance  function  not  found  in  natural  scenes.  This  image,  therefore,  provides  a  criti¬ 
cal  test  of  the  similarity  between  the  human  use  of  shading  and  this  estimator  of  surface 
orientation. 

Ikeuchi  and  Horn  (3]  measured  the  reflectance  function  for  this  image  and  found  that 
the  image  intensities  may  be  reasonably  well  described  by 

/  =  Ar(N-V)-1 

where  k  is  approximately  0.8.  If  we  carry  out  the  required  computations,  we  see  that  the 
tilt  is  still  the  direction  along  which  <Pl  is  greatest,  and  that  the  z-component  of  the  surface 
normal  is  approximately  proportional  to  V2///,  as  in  normal  images.  We  can  thus  expect  to 

20However,  when  people  are  able  to  view  the  original  higher-resolution  image  or  the  entire  image  they 
perceive  the  surface  correctly. 
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Figure  10.  An  electron  mlerotcope  image.  Part  (a)  is  an  electron  microscope  image  of  resin  nodules 


on  a  flower  of  canruibi*  rativa,  and  the  portion  of  the  image  for  which  shape  was  estimated.  Part  (b)  shows 
a  relief  map  showing  a  side  view  of  the  surface  estimated  for  this  image.  The  fact  that  both  people  and 
this  algorithm  can  correctly  use  the  shading  information  in  electron  microscope  images  may  have  important 
implications  for  understanding  human  vision. 

obtain  a  reasonable  shape  estimate  for  EM  images  by  using  the  same  estimation  technique 
developed  for  normal  images. 

Figure  10  ( b )  the  portion  of  10  (a)  for  which  shape  was  estimated.  Figure  10  (c)  is  a 
relief  map  showing  aside  view  of  the  estimated  surface  shape,  again  obtained  by  integrating 
the  slant  and  tilt  estimates  for  this  image.  It  can  be  seen  that  the  estimated  surface  shape 
is  quite  accurate. 


IV.  Discussion 

The  preceding  portions  of  this  paper  have  shown  that  it  is  possible  to  obtain  useful 
estimates  of  scene  propert  ies  from  natural  images  by  using  a  local  analysis  of  image  shading. 
The  analysis  does  not  assume  that  any  scene  information  is  known  beforehand;  no  knowledge 
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of  scene  characteristics  or  boundary  conditions  is  used.  Thus,  the  techniques  are  applicable 
to  “raw”  images.  Because  no  a  priori  information  is  assumed,  however,  the  recovery  of 
information  about  the  scene  is  necessarily  imprecise. 

One  major  problem  which  is  inherent  in  obtaining  shape  from  shading  under  general 
viewing  conditions  is  that  the  measured  image  intensities  are  not  equal  to  the  image  ir- 
radiance.  Film,  video  camera,  and  other  image  transcription  methods  produce  image  inten¬ 
sity  measurements  which  are  (generally)  non-linear  transformations  of  the  image  irradiance. 
It  is,  therefore,  surprising  to  note  that  humans  have  no  problem  in  utilizing  the  shading  in 
such  transformed  images  even  though  the  relationship  between  the  transformed  image  and 
the  original  image  irradiance  is  unknown.  Thus,  any  shape-from-shading  technique  which 
will  be  as  generally  useful  as  the  human  capacity  must  function  despite  such  transformations 
of  the  data. 

The  shape-from-shading  techniques  described  in  this  paper  are  relatively  unaffected 
by  smoot  h,  monotonic  transformations  of  the  image  data  —  in  marked  contrast  to  previous 
methods  of  inferring  shape  from  shading.  This  robustness  is  achived  by  dividing  the 
Laplacian  of  the  image  intensities  by  the  intensities  themselves,  thus  removing  the  primary 
effects  of  any  multiplicative  terms  in  the  image  irradiance  equation.  The  division  also 
removes  the  effects  of  any  linear  scaling  of  the  image  intensity.  Thus,  division  of  the 
Laplacian  by  the  intensity  compensates  for  any  transformation  of  the  image  irradiance  which 
is  locally  approximately  linear. 

What  is  the  use  of  a  reasonably  accurate,  but  certainly  not  infallible,  local  estimate 
of  scene  properties?  Several  potential  applications  spring  to  mind:  to  provide  an  initial 
“guess”  for  a  more  global  shading  analysis  (20),  to  constrain  stereo  matching  by  providing 
a  qualitative  estimate  of  shape,  or  to  help  in  the  estimation  of  albedo.  One  other  use  that  1 
have  begun  examining  is  classification  of  the  type  of  imaged  contours.  This  serves  as  a  good 
demonstration  of  the  potential  usefulness  of  a  reasonably  accurate  local  estimate  of  surface 
orientation. 

Once  we  are  given  the  location  of  a  contour  we  should  be  able  to  use  our  local  estimate 
of  surface  orientation  to  acertain  whether  a  contour  is  a  smooth  occluding  contour  (i.e., 
a  contour  formed  by  the  surface  curving  smoothly  out  of  sight,  such  as  is  found  at  the 
edge  of  an  image  of  a  sphere)  by  checking  whether  our  estimates  of  surface  orientation  are 
appropriate  for  that  contour.  If  the  surface  adjoining  one  side  of  a  contour  has  a  large  slau* 
and  a  tilt  perpendicular  to  the  contour,  then  it  is  likely  a  smooth  occluding  contour.  On 
the  other  hand,  if  the  estimated  slant  is  small,  or  if  the  tilt  is  not  perpendicular,  then  it  is 
quite  probably  not  a  smooth  occluding  contour. 

Figure  1 1  shows  the  results  of  applying  this  typing  strategy  to  the  contours  extracted 
from  two  natural  images.  Part  (a)  of  this  figure  shows  the  Moore  sculpture  image,  and 
part  (d)  the  Tuckerman’s  ravine  h.’sage.  Parts  (b)  and  (e)  depict  the  discontinuity  contours 
found  in  these  images.  Parts  (c)  and  (f)  of  this  figure  show  the  contours  that  were  adjoined 
by  regions  whose  estimated  surface  orientation  was  consistent  with  the  contours’  being  a 
smooth  occluding  contour.  When  we  compare  the  coni  identified  as  smoothly  occluding 
with  the  original  images,  we  find  that  this  criterion  is  qui.j  apt  in  identifying  the  smooth 
occluding  contours  in  these  images. 
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Figure  11.  Results  of  typing  the  contours  extracted  from  two  nature!  Images.  Part  (a) 
of  this  figure  shows  the  Moore  sculpture  image,  and  part  (d)  shows  the  Tuckerman’s  ravine  image.  Parts 
(b)  and  (e)  show  the  discontinuity  contours  found  in  these  images.  Parts  (c)  and  (f)  show  the  contours 
which  were  adjoined  by  regions  whose  estimated  surface  orientation  was  consistent  with  the  contour  being 
«  smooth  occluding  contour.  When  we  compare  these  contours  to  the  original  images,  we  find  that  this 
criterion  does  quite  well  at  identifying  the  smooth  occluding  contours  in  these  images. 


seems  to  have  considerable  utility  as  a  model  of  one  aspect  of  the  functioning  of  biological 
visual  systems.  It  is  known  from  studies  of  neurophysiology  [8],  [9]  and  human  psychophysics 
[10],  [11]  that  the  retinal  receptive  fields  in  mammals  have  a  center-surround  organization 
that  is  well  modeled  by  the  filter  ^2G(x, y, <r).  In  addition,  the  responses  of  these  retinal 
neurons  are  logarithmically  scaled  by  the  intensity,  so  that  their  response  r  to  an  image 
point  /(*0’S/o)  can  reasonably  be  modeled  by  the  following  convolution  and  division: 

=  V2G(s,y,g)®/(x  o.yo) 

/(*  o.Vo)  (  1 

The  quantity  r  is  the  measurement  needed  by  the  slant  estimator.  Thus,  it  seems  that  the 
information  required  to  estimate  slant  is  present  in  the  output  of  the  mammalian  retina. 
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It  is  also  well  established  [12],  [13],  [14]  that  many  (perhaps  most)  cortical  neurons  in 
the  primary  visual  cortex  have  oriented  receptive  fields  whose  responses  characteristics  are 
closely  modeled  by  the  filter  d2G(x,y,ir)/dx2.  Moreover,  there  is  very  strong  evidence  (15], 
[16]  that  these  cortical  neurons  are  constructed  by  summing  the  center-surround  receptive 
fields  described  by  Equation  (5),  just  as  was  done  here.  Thus,  not  only  does  it  seem  that 
the  image  data  required  by  the  tilt  estimator  is  present  in  the  mammalian  primary  visual 
cortex,  but  the  information  is  apparently  derived  from  the  image  by  means  of  the  same 
steps  that  have  been  employed  here. 
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V.  Appendix 

In  this  section  the  previous  five  propositions  will  be  proved.  The  strategy  of  proof 
will  be  to  start  with  the  six  values  for  I,  Ix,  Iy,  Ixx,  Iyy  and  Ixy  and  then  solve  for  surface 
orientation,  curvature,  illuminant  direction  and  albedo  times  illuminant  intensity  under  the 
assumption  that  the  point  is  an  umbilical  point  on  a  Lambertian  surface.  This  solution  will, 
at  the  same  time,  prove  Propositions  1,  2,  and  4.  Some  additional  calculations  using  the 
results  of  this  solution  will  then  prove  Propositions  3,  5  and  6. 


A.  Solution  For  Umbilical  Points 


Consider  the  surface  of  a  sphere  of  radius  R: 

z(x,v)=Vr2 


This  equation,  with  R  >  0  R  >  x  >  —  R  and  R  >  y  >  —R,  describes  the  set  of  all 
umbilical  points.  From  this  equation  we  see  that  Zx  —  —xT~lI2  and  Zy  =  —yT~1/2  where 
T  =  R2  —  x2  —  jr.  Assume  that  the  illuminant  is  unknown,  so  that  we  must  consider  all 
illumination  directions  L  ==  (lirhik)-  Then,  if  the  surface  is  Lambertian,  we  have 

/(*, ,)  -  ,XN  ■  L  =  _  P\xli  _  _  (jr./2)  , , , 

yJZi  +  Zl  +  l  R 


where  p  is  the  surface  albedo  and  X  is  the  illuminant  intensity  at  the  surface.  The  first  and 


second  derivatives  are  then 

/*  =  fH,+*l3  T"'/2)  (2) 

rv  =  p-^(-h  +  yizT~l/2)  0) 

Ixx  =  +  x2I3T~3/2)  (4) 

Iyy  =  +  v2l3T~3'2)  (5) 

hy  =  P^{*vhT~Z'2)  (6) 


Assume  that  the  values  of  /,  Ix,  Iy,  Ixx,  Iyy  and  Ixy  are  known;  we  may  now  solve  for 
surface  orientation,  curvature,  iluminant  direction  and  albedo  times  illuminant  intensity. 
Solution 

Using  Equation  (6)  to  solve  for  ^  we  obtain 


p\  _  IxyT V2 
R  xyh 


(7) 


Using  Equations  (7)  and  (4)  we  obtain 

Ixx  =  (/3r_,/2  +  z2/37-3/2)^  =  (/3r-‘/2  +  x2/3r-3/2)^--  =  (— 2 

R  xyl3  xy 


)IXy  (8) 
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and  using  Equation  (7)  with  Equation  (5)  we  obtain 


lyy  =  (IzT'1!'-  +  y2/3T~3/2)—  =  (/aT-1/2  +  =  (^-y-)/,„  (9) 


Using  Equations  (8)  and  (9)  we  see  that 


zylxx  2  xVlyy  2 

T  =  -7 - *  =  -r-^-V 

Igy  ixy 


Using  Equations  (9)  and  (10)  we  see  that 


r  +  y2 


—  ^xy  "l"  ixy 

y  a: 


Letting  £  =  |  we  see  that  Equation  (11)  is  a  quadratic  in  k,  which  we  can  solve  to  obtain 


-(Ixx  -  Iyy)±  yj(lxx  ~  lyy)2  +  4 1% 


From  this  we  may  obtain  the  surface  tilt  r  =  tan-1  k  Thus  the  tilt  of  the  sphere’s  surface 
may  be  determined  without  knowledge  of  the  illuminant  direction,  the  illuminant  strength, 
the  surface  albedo  or  the  surface  curvature.  To  prove  Proposition  3,  which  stated  that  the 
tilt  of  the  surface  is  in  the  direction  of  maximum  <Pl,  it  remains  only  to  show  that  this 
solution  for  the  surface  tilt  is  the  direction  of  maximum  rf2/.  The  remainder  of  the  proof 
will  be  presented  in  the  following  subsection. 

Is  this  solution  unique?  Equation  (12)  yields  two  possible  solutions: 


Note  that  we  may  also  solve  Equation  (12)  for  A:- 1 ;  this  yields 


k~x  = 


-(Ixx-Iyy)±J{Ixx-Iyy)2  +  4nj 


Equation  (13)  also  gives  two  solutions,  k3  and  k4, 

*3  =  *r,==-  *4  =  *2,s=- 

yi  yz 

As  the  left-hand  side  of  Equation  (12)  is  the  negative  of  the  '~*t-hand  side  of  Equation  (13) 
we  find  that  either  k3  —  —  kjl,  which  leads  to  a  contradiction,  or  k3  =  — fcj*.  Thus 
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This  proves  that  the  two  solutions  of  Equation  (12)  are  perpendicular.  We  shall  see  later  that 
this  allows  us  to  discard  one  of  the  solutions,  because  it  results  in  an  iiluminant  direction 
that  is  behind  the  observed  object. 

Once  the  tilt  r  is  known,  the  only  remaining  component  of  surface  orientation  is  <r, 
the  slant  of  the  surface,  which  is  equal  to  the  arccosine  of  the  z  component  of  the  surface 
normal,  zn ■  Noting  that 

-1  -\/R2  -  x2  -  y2 

ZN  =  — -  —  = - B - 

yJZ2  +  Z2  +  l 


we  find  that  we  can  use  Equation  (10)  to  determine  the  surface  slant.  We  may  average  the 
two  expressions  for  T  in  Equation  (10)  to  obtain 


T  =  R2  -  x2-y2  — 


xyjlxx  +  /yy)  _  (*2  +  y2)  =  xyV2I  _  ( x 2  +  y2) 


21- 


xy 


21 


(14) 


*v 


If  we  add  x2  +  y2  to  both  sides  of  Equation  (14)  we  then  obtain 

2  =  xyW  (x2  +  y2) 

2  Uy  2 

Thus,  from  Equations  (14)  and  (15): 


R2  —  x2  —  y2 


R2 

xyV*t 

'r.. 

2 

I«a+ra) 

2 

xyV2 1  - 

'(x2  +  y2)/,f 

ryV2/  4 

'(**  +  y2K*„ 

*V2/- 

(*2  +  !)/„ 

*V2/  +  (*2  +  l)/„ 


(15) 


(16) 


From  z2n  we  can  obtain  the  surface  slant  a  =  cos-1  \/z%  This  concludes  the  proof  of 
Proposition  4.  Note  that  there  is  only  one  solution  for  the  surface  slant,  as  0  >  x/v  >  —  1. 
With  both  the  slant  and  tilt  determined,  we  now  know  the  surface  orientation.  Once  again, 
this  has  been  accomplished  without  prior  knowledge  of  iiluminant  direction,  iiluminant 
strength,  surface  curvature,  or  surface  albedo. 

We  may  now  proceed  to  solve  for  the  remaining  unknowns.  Because  we  know  the 
surface  orientation,  we  can  compute  \  —  xjR  and  q  =  y(R. 


X  —  cosr  sin  a  7  =  sinrsin<r 

The  quantities  \  and  7  may  be  thought  of  as  the  x  and  y  coordinates  normalized  to  the 
unit  sphere.  Using  x  and  7,  we  may  define  T,  a  unit  sphere  analogue  to  T 
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Going  back  to  Equation  (7)  we  now  know  all  of  the  terms  on  the  left-hand  side  except  f3, 
thus,  let  us  write 

/>Xf3  =  IxyT3!2  =  IxyYzt2R 
R  xu  Y7 


and  so 


X7 


We  may  then  subsitute  this  equation  into  Equations  (2)  and  (3)  to  give 

p\lt  =  _  ItR  (18) 

7 

and 

I  r R2 

pyi2  =  iniJL-  IyR  (l9) 

If  we  now  convert  Equation  (1)  to  the  variables  x>  7  and  T,  we  may  substitute  Equations 
(17),  (18)  and  (19)  into  (1)  to  obtain  a  quadratic  in  R: 

i  —  —(-xit  -  yi2  -  r'/2/3) 

—  pH-xii  -  -  r,/2/3) 

_  Izy^R2  JxyVR2  _  ,  m  Ixyr2R2  (20) 


=  -x(^^  -I,R)- 7(^ 
-(x/.  +  7/f)fl- 


7yf?)- 


We  may  solve  this  for  /?  to  obtain 


X7^(xf*  +  7 /*)  ±  \/W*  +  7/y)2  ~ 


2/,yr/?2  '  ' 

Because  L  is  a  unit  vector,  we  may  now  use  Equations  (17),  (18)  and  (19)  to  determine  (p\)2: 


(p\f  —  (pXh?  +  (pX/2)2  +  (/>X/3)2 


7*ur/?3 


-■")  •(“=-«)  *( 


/,„r3/2/?2' 


As  pX  >  0,  we  may  discard  the  negative  root  of  \fpX,  so  that  p\  is  uniquely  determined. 
Using  this  value  of  p\,  we  may  now  substitute  into  Equations  (17),  (18)  and  (19)  to  obtain 
It,  h  and  /3. 

Note  that  the  signs  of  /j,  f2  and  /3  depend  on  the  signs  of  \  and  7.  Thus,  in  solving 
Equation  (12)  one  of  the  two  solutions  will  make  \7  negative  making  /3  negative  —  which 
corresponds  to  an  illuminant  behind  the  observed  surface.  Therefore,  only  one  of  the  two 
solutions  of  (12)  is  physically  possible. 
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Because  Equation  (12)  gives  only  the  solution  for  yfx,  there  are  two  possible  pairs 
(zi.yi),  such  that  ^  =  ~  Vilx2-  These  two  solutions  are  each  other’s 

negative,  i.e.,  xi  —  — z2>  =  — y2;  consequently,  the  surface  tilts  for  these  two  solutions 

are  180°  apart  so  that  one  corresponds  to  a  convex  surface,  the  other  to  a  concave  surface. 
Because  it  is  the  signs  of  \  and  7  that  determine  the  signs  of  l\  and  /2,  choosing  one  of  the 
(x,y)  pairs  results  in  an  illuminant  direction  that  is  overhead  (i.e.,  L  ■  (0, 1,0)  >  0),  while 
picking  the  other  results  in  an  illuminant  direction  that  is  below  the  viewing  line.  Thus,  if 
we  specify  an  illuminant  direction  which  must  be  overhead  and  in  front  of  the  illuminated 
object,  there  is  only  one  possible  solution  to  Equation  (12).  The  symmetry  between  the 
signs  of  x,  y  and  l\,  1 2  is  commonly  familiar  as  the  crater  illusion,  in  which  the  convexity  of 
the  surface  changes  as  the  perceived  illuminant  direction  shifts  from  overhead  to  below  the 
viewing  line. 

We  have  now  solved  for  each  of  the  unknown  quantities,  and,  by  so  doing  have  shown 
that  to  each  set  of  measurements  I,  Ix,  /y,  /**,  lVy  and  Ixy  there  corresponds  exactly  one 
combination  of  surface  orientation,  curvature,  (overhead)  illuminant  direction,  and  factor 
p\  for  a  Lambertian  umbilical  point.  This  concludes  the  proofs  of  Propositions  1  and  2. 


B.  Proposition  3 

In  the  preceding  subsection  it  was  shown  that 

—  1  ,  y 

r  =  tan  k  —  tan-  - 


where 


(/**  —  fyy)  ±  \J{IXX  -fyy)2  +  4/*y 


2/j 


(12) 


*» 


It  remains  to  show  that  this  solution  is  equivalent  to  the  proposition  that  the  tilt  is  the 
image  direction  in  which  cPl  is  greatest. 

We  know  that,  given  Ixz,  /yy  and  Ixy  we  may  obtain  these  quantities  in  any  other 
image  plane  coordinate  system  (x*,y*)  that  is  a  rotation  of  (x,  y)  by  the  angle  £.  First  we 
note  that  . 

dy  .  r  di  dy 


dx 
dx * 


*  *  .  - 


+  lt  j  * 
dy  “V 


The  standard  rotation  transformation  is 


x  =  xc(  +  ya( 
y*  =  -xs€  +  yc( 

where  a(  and  c$  are  the  sine  and  cosine  of  the  angle  £.  The  inverse  of  this  rotation 

transformation  is  t  m 

x  —  x  —  y  S( 

y  =  1 :'s(  +  y*c( 

Thus, 

dx  dy 


dx 
dx * 


dy 
dx * 


=  H 


dy 


dy 


.  = 
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and  so 


/*  =  lgC(  +  7y«{ 

/*  =  — /*«{  +  I9C£ 


Similarly, 


_  d(Is.)  dx  d(Ix.)  dy 
■*»  »*  ■  * 


dx  fa*  dy  dx* 


_  d{Iym)  dx  d(Iy*)  dy 

Vy  dT  dy *  dy  d? 


d(Ix»)  dx  </(/,»)  dy 
*V  dx 


dy*  dy  dy* 


resulting  in 


I d"  d" 

/«*«•  “  "t"  2/jpySfC^ 


*V  y 


/«*y*  =  +  J*y(c£  ~  ) 


To  find  the  direction  for  which  <fil  attains  its  maximum,  we  find  the  angle  (  for  which  Ix-x- 
attains  a  maximum  over  all  rotations  of  the  image  plane  coordinate  system.  As  /,»,•  is 
equal  to 


Ixmx*  —  Ixzc\  d"  lyya\  +  2/,y«^C( 


the  maximum  of  /•_•  occurs  at 


d(Iz',-) 


dt 


-  (4|l  dgg)2 9(C(  +  2/Zy(c£  8() 

=  (/yy  IXx)a2{  d"  2/^yC2€ 


which  was  obtained  by  using  the  relations  sin  2£  =  2  sin  (  cos  £  and  cos  2(  —  cos2  £  —  sin2 
Solving  this  for  £  we  see  that  the  angle  (  for  which  d2/  attains  its  maximum  satisfies 


tan  2(  — 


2/: 


*y 


/»  fyjr 

Using  Equations  (4),  (5)  and  (6),  we  see  that  for  a  sphere 


tan  2^  = 


27. 


—  *v 


2 &{xyliT-3'2) 


2  xy 


Ux  -  7yy  ^(x2/3r-3/2  _  y2/3r-3/2)  x2  _  y2 


However,  noting  that  for  a  sphere  tan  r  ==  y/x  and  that  tan  2r  =  we  have 

2  tan  r  2xy 


tan2r  = 


1  —  tan2  r 


x2  —  y2 


Thus,  £  =  r  and  so  Proposition  3  is  proved. 


PENTLAND 


LOCAL  SHADING  ANALYSIS 


C.  Proposition  5 

Let  us  assume  that  we  have  observed  that  dPl  =  t^N -L  =  0  in  the  direction  (dx,  dy), 
and  that  this  situation  continues  for  some  distance  along  ( dx,dy ).  In  this  situation,  either 
(1)  is  perpendicular  to  L  or  (2)  the  magnitude  of  t^N  is  zero.  It  is  unlikely  that  (^N 
is  perpendicular  to  the  illuminant  over  any  distance;  thus,  if  we  see  that  dPl  =  0  it  must 
be  the  case  that  the  magnitude  of  is  zero.  Therefore,  dN  is  some  constant  vector  as 
we  take  a  step  along  ( dx,dy ). 

If  the  magnitude  of  dN  along  (dx,  dy)  is  zero,  then  at  least  one  of  the  surface  curvatures 
is  zero,  i.e.,  the  surface  is  cylindrical  or  planar.  If  the  magnitude  of  dN  is  not  zero,  then, 
when  we  take  a  step  in  the  direction  ( dx,dy )  there  is  some  change  in  surface  orientation.  In 
this  case,  either  the  amount  of  forshortening  that  occurs  with  each  infinitesimal  step  along 
( dx,dy )  will  change  or  N  will  be  constant,  contrary  to  the  assumption  that  dN  was  not  zero. 
If  there  is  change  in  the  foreshortening  and  yet  dN  remains  constant,  the  surface  orientation 
relative  to  the  viewer  and  the  intrinsic  surface  curvature  are  in  an  exactly  reciprocally 
relation,  which  is  a  violation  of  general  position. 

Thus,  when  dpi  —  0  for  some  distance  along  a  direction  ( dx,dy ),  the  surface  is  either 
cylindric  or  planar.  If  it  is  planar,  d2/  =  0  in  all  directions.  The  converse  —  that  if  we  also 
observe  d2/  —  0  along  the  direction  orthogonal  to  (dx,dy)  the  surface  is  then  planar  —  is 
also  true.  From  the  previous  reasoning,  dPl  =  0  implies  that  dN  is  zero  along  that  direction 
(if  general  position  is  assumed).  If  dN  is  zero  in  two  directions,  the  surface  is  planar. 


D.  Proposition  0 

Proposition  6  suggests  that  the  following  equation  is  a  useful  estimator  of  zjv,  the  z 
component  of  the  surface  normal,  equal  to  the  arccosine  of  the  surface  slant: 


^ — e(,vf£,_r.y 


where  c  is  a  constant  related  to  the  surface  curvature. 

We  may  examine  this  estimator  in  the  context  of  the  calculations  presented  so  far. 
First  we  use  Equations  (4)  and  (5)  to  find  that 


V2/  =  lxx  +  Iy,  =  p—(2lzT-'t2  +  (xz  +  y2)/3T-3/2) 

Ml 


Thus, 


V2/  =  g(2  /3r-1/2  +  (x2  +  y2)/3r-3/2) 

I  £(_*/,  _j,/2-/3r>/=)  ] 

If  we  assume  that  —  xl\  —  yl?  is  zero  (as  is  true  on  the  average,  although  not  neccessarily 
true  for  any  one  image  point)  Equation  (23)  becomes 

V2/ 

|  — |  =2r_1  +(z2  +  y2)7’_2 

=  1R2  -  -*2  -  V2  (24) 

(R2  —  x2  —  y2)2  *  ' 

R2  +  T 

“  T2 
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To  show  how  this  estimator  might  work,  let  us  first  take  c  =  T~x!2 
(R2  —  x2  —  p2)-'/2;  we  then  find  that,  in  fact, 


-,n(R2  +  T  ir'/g 

V  T2  T ) 

IT 

R2 _ 

R2  —  x2  —  p 2 

R2  *N 


The  problem  with  actually  using  this  choice  of  c  is  that  it  requires  prior  knowledge  of 
the  surface  slant.  We  must  find  a  constant  for  our  choice  of  c.  If  we  take  c  =  1/R,  then  we 
find  that 

-«I^I-C (25) 


where 


=  Rs(  J5  -  y)  -  ~  *7?) 


Because  t  is  a  term  in  z^1  the  estimated  surface  slant  will  be  larger  than  the  actual  one, 
introducing  a  bias  into  our  estimate.  This  bias  can  be  removed,  however,  as  follows.  Let 
a  =  c(|Vl  ~  c2)-1/2  be  the  estimated  surface  slant.  Then,  combining  Equations  (25)  and 
(26),  we  have 

- T. - 

Then 

*2(1  —  +  *N2)  =  ^2  =  2N 

so  that  we  obtain  a  quadratic  in  ZN 

(1  +  «a)«/v  —  «2Z/y  —  «2  ==  0  (27) 

Equation  (27)  can  then  be  solved  to  obtain  an  unbiased  estimate  of  z/v 


a 2  +  y/Aa2  +  5  a* 

‘ " - V - 2^ - 

(There  is  only  one  solution  as  0  >  z^  >  — 1.)  This  concludes  the  proof  of  Proposition  5. 

This  estimate  of  z/v,  while  unbiased,  is  not  exact  because  it  was  necessary  to  assume 
that  — x/]  —  p/2  =  0.  If  we  examine  the  conditions  under  which  this  factor  causes  significant 
error,  we  see  that  there  will  be  large  errors  only  when  all  of  the  following  conditions  occur 
simultaneously: 

(1)  The  surface  slant  is  relatively  large 

(2)  The  z-component  of  the  illuminant  direction,  / 3,  is  small 

(3)  The  surface  faces  closely  toward  or  away  from  the  illuminant. 
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Figure  12.  Blue  due  to  lliuminent  direction.  This  figure  shows  the  image  intensity  profile,  the 
profile  of  the  true  surface  shape  (a  sphere)  and  the  profile  of  the  reconstructed  surface  for  four  illumination 
conditions.  In  each  case  the  profile  is  taken  along  the  image  line  which  goes  through  the  center  of  the  sphere 
and  directly  toward  the  illuminant.  This  is  the  direction  along  which  the  estimation  errors  are  largest.  The 
distributions  of  illumination  are  extended  sources,  such  as  would  occur  if  the  imaged  sphere  were  placed  on 
a  desktop  which  was  near  a  window.  The  leftmost  distribution  shown  is  centered  directly  behind  the  ricwer 
at  (O.O.O.0, 1.0),  the  next  (proceeding  left  to  right)  is  centered  at  (0.3,0.0,0.054),  the  next  at  (0.6, 0.0, 0.8), 
and  the  rightmost  at  (0.9,0.0,0.436). 

Figure  12  shows  the  bias  due  to  illuminant  direction  which  occurs  during  the  estimation 
of  surface  orientation  for  the  image  of  a  sphere.  This  figure  shows  the  image  intensity 
profile,  the  profile  of  the  true  surface  shape  (a  sphere)  and  the  profile  of  the  reconstructed 
surface  for  four  illumination  conditions.  In  each  case  the  profile  is  taken  along  the  image 
line  which  goes  through  the  center  of  the  sphere  and  directly  toward  the  illuminant.  This 
is  the  direction  along  which  the  estimation  errors  are  largest. 

The  distributions  of  illumination  are  extended  sources,  such  as  would  occur  if  the 
imaged  sphere  were  placed  on  a  desktop  which  was  near  a  window.  The  leftmost  distribution 
shown  is  centered  directly  behind  the  viewer  at  (0.0, 0.0, 1.0),  the  next  (proceeding  left 
to  right)  is  centered  at  (0.3,0.0,0.954),  the  next  at  (0.6, 0.0, 0.8),  and  the  rightmost  at 
(0.9,0.0,0.436).  Note  that  the  rightmost  distribution  of  illumination  results  in  an  almost 
linear  gradient  across  the  image.  Thus,  these  examples  approximately  span  the  range  of 
illumination  directions  found  in  natural  scenes. 

Comparing  the  true  surface  shape,  shown  across  the  bottom  of  Figure  12,  to  the 
reconstructed  surface  shape21  shown  across  the  middle  of  Figure  12,  we  see  that  the  bias 
due  to  illuminant  direction  does  not  cause  large  errors 


21  As  obtained  by  integrating  the  estimated  surface  orientation. 
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ABSTRACT 

The  image  irradiance  equation  constrains  the  relationship  between  surface  orientation 
in  a  scene  and  the  irradiance  of  its  image.  This  equation  requires  detailed  knowledge  of 
both  the  scene  illumination  and  the  reflectance  of  the  surface  material.  For  this  equation 
to  be  used  to  recover  surface  orientation  from  image  irradiance,  additional  constraints  are 
necessary.  The  constraints  usually  employed  require  that  the  recovered  surface  be  smooth. 
We  demonstrate  that  smoothness  is  not  sufficient  for  this  task. 

A  new  formulation  of  shape  from  shading  is  presented  in  which  surface  orientation  is 
related  to  image  irradiance  without  requiring  detailed  knowledge  of  the  scene  illumination,  or 
of  the  albedo  of  the  surface  material.  This  formulation,  which  assumes  isotropic  scattering, 
provides  some  interesting  performance  parallels  to  those  exhibited  by  the  human  visual 
system. 


1  INTRODUCTION 

Most  previous  work  [1-8]  on  the  problem  of  recovering  surface  shape  from  image  shading 
has  been  based  on  solving  the  image  irradiance  equation,  which  relates  the  radiance  of  a 
scene  to  the  irradiance  of  its  image  [1,2].*  This  formulation  of  the  relationship  between 
scene  radiance  and  image  irradiance  is  embodied  in  a  first-order  partial  differential  equation 
expressing  scene  depth  as  a  function  of  image  coordinates.  Such  a  formulation  requires 
specific  knowledge  of  not  only  the  reflectance  characteristics  of  the  surfaces  in  the  scene, 
but  also  the  position  and  strength  of  illumination  sources.  The  approaches  to  solving 
this  differential  equation  have  generally  been  either  by  direct  integration  [1]  or  through 
an  iterative  algorithm  that  attempts  to  reduce  the  difference  between  the  predicted  image 
irradiance  and  the  measured  value  [5-7],  Our  interest  is  in  the  iterative  approach  because 
the  alternative  to  it  —  direct  integration  —  requires  specific  boundary  conditions  that  are 
generally  unknown  (in  natural  scenes),  and  its  behavior  when  applied  to  noisy  pictures,  is 
uncertain. 

As  the  image  irradiance  equation  is  a  single  equation  relating  image  irradiance  and  two 
independent  variables  (specifying  surface  orientation),  it  does  not  uniquely  determine  the 
two  independent  variables  for  a  given  value  of  image  irradiance.  Consequently,  when  this 
equation  is  used  to  recover  surface  shape  additional  constraints  are  necessary.  These  may 

'Image  irradiance  is  the  light  flux  per  unit  area  falling  on  the  image,  i.e.,  incident  flux  density.  Scene 
radiance  is  the  light  flux  per  unit  projected  area  per  unit  solid  angle  emitted  from  the  scene,  i.e.,  emitted 
flux  density  per  unit  solid  angle. 
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bo  imposed  by  boundary  renditions,  by  restrictions  on  the  type  of  surface  to  be  recovered, 
or  by  a  combination  of  the  two.  For  some  images,  when  we  can  determine  important 
features  (such  as  the  fact  that  an  edge  is  an  occlusion  boundary  caused  by  a  surface  turning 
smoothly  away  from  the  viewing  direction),  we  can  use  boundary  conditions  to  constrain  the 
solution;  in  large  portions  of  the  image,  however,  we  can  say  something  only  about  the  type 
of  surface  we  would  like  to  recover.  To  date  surface  smoothness  is  the  weakest  additional 
assumption  that  has  allowed  surface  shape  to  be  recovered.  Smoothness  normally  signifies 
that  the  surface  is  continuous  and  that  it  is  once  or  twice  differentiable.  Smoothness,  as  the 
additional  assumption,  has  had  to  play  the  role  of  propagator  of  boundary  conditions  and 
selector  of  the  surface  to  be  recovered.  Is  smoothness  capable  of  these  tasks  in  general  or  is 
its  usefulness  limited  to  special  cases? 

In  the  first  part  of  this  paper  we  describe  the  various  formulations  that  have  employed 
smoothness,  including  a  relaxation  procedure  of  our  own  that  resembles  its  counterpart 
in  engineering;  we  then  present  results  of  our  experiments  with  these  iterative  procedures. 
Assessing  the  usefulness  of  smoothness  in  this  context,  we  conjecture  as  to  its  utility  in  other 
shape-from-shading  formulations. 

Not  all  authors  have  used  smoothness  as  their  additional  constraint;  some  have 
employed  assumptions  about  surface  shape  instead.  The  assumption  that  the  surface  is 
locally  spherical,  i.e.,  that  its  curvature  is  independent  of  direction,  is  strong  enough  to 
allow  but  a  single  interpretation  for  the  surface  orientation,  and  at  the  same  time,  it  is  also 
one  that  enables  recovery  of  the  surface  orientation  by  purely  local  computation  [8).  In 
addition,  this  shape  constraint  eliminates  the  need  to  know  such  parameters  as  illuminant 
direction  and  surface  albedo.2  Assumptions  about  shape  are  being  traded  for  assumptions 
about  reflect  ance  behavior.  Can  we  formulate  the  shape-from-shading  problem  without 
having  to  know  the  details  of  the  surface  reflectance  and  without  making  any  assumptions 
about  the  shape  of  the  surface  we  wish  to  recover? 

In  the  new  formulation  presented  in  the  second  part  of  the  paper,  we  assume  that  scene 
materials  scatter  light  isotropically.  We  make  no  assumptions  about  surface  shape  and  we 
do  not  need  to  know  the  parameters  specifying  illuminant  direction,  illuminant  strength, 
and  surface  albedo.  Our  assumptions  are  about  the  properties  of  reflection  in  the  world; 
these  alone  are  sufficient  to  relate  surface  orientation  to  image  irradiance.  In  situations 
in  which  the  assumption  of  isotropic  scattering  is  invalid,  the  formulation  provides  some 
interesting  parallels  to  human  vision. 


2  ITERATIVE  FORMULATIONS  FOR  SURFACE  RECOVERY 

The  image  irradiance  equation  as  presented  by  Horn  [2],  is 

I(x,y)  =  R(p,q)  , 

where  !{x,y)  is  the  image  irradiance  as  a  function  of  the  image  coordinates  x  and  y,  and 
R(p. q)  is  the  surface  radiance  as  a  function  of  p  and  q,  the  derivatives  of  depth  with  respect 

^Surface  albedo  is  the  material  reflectance,  i.e.,  the  ratio  of  scene  radiance  to  scene  irradiance. 
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to  the  image  coordinates.  To  derive  this  equation  orthographic  projection  is  assumed;  while 
orthographic  projection  is  inadequate  to  describe  image  formation  it  is  a  good  approximation 
when  the  scene  objects  are  small  compared  to  the  viewing  distance.  In  the  shape-from- 
shading  approach,  it  is  generally  assumed  that  R(p,q)  is  known  for  all  p  and  q  (that  is,  the 
reflectance  map  is  specified).  The  iterative  approach  applies  this  equation  on  a  pixel-by-pixe! 
basis,  that  is,  for  pixel  («,  j) 

A'.j  =  R(pij,qi,j)  > 

where  7,  y  is  the  image  irradiance  for  («,  j>)th  pixel  and  p,j ,  qij  is  the  surface  orientation  of 
the  surface  patch  that  is  imaged  at  pixel  (»,_/).  For  convenience  we  use  the  notation 

Rij  —  R(pi,j,qij) 

If,  at  some  stage  of  the  iterative  procedure,  we  have  assigned  particular  Pij,qij  as  the 
surface  orientation  of  the  (»,j)th  pixel,  then  the  residual  expression 

=  (Ii,j  ~~  Ri,j)2 

specifies  the  error  caused  by  our  assignment  of  surface  orientation.3  If  this  were  our  only 
constraint,  we  could  select  Pij,qij  so  that  {ijR  =  0.  This  would  guarantee  that  the 
image  irradiance  equation  is  satisfied  pixel  by  pixel,  but,  because  there  are  infinitely  many 
solutions,  we  need  further  constraints  to  reduce  the  number  of  possible  solutions. 

Smoothness  is  usually  introduced  by  specifying  a  relationship  that  we  would  like  to 
have  hold  between  the  surface  orientation  of  the  («,  j)th  pixel  and  its  neighbors.  The  various 
iterative  approaches  [5-7]  differ  in  the  way  this  relationship  is  specified.  Of  course,  at  a 
particular  stage  of  the  iterative  process  this  relationship  between  a  pixel  and  its  neighbors 
will  not  be  exact.  Once  again  we  can  specify  a  residual  equation  for  the  error  in  the 
smoothness  relation. 

Zi.jS  —  [/( P i.j>  qij <Pi-t  Jt  qi- l,j .  Pi+ 1  ,j,  ?»+  i,j >  Pi,j- 1 » 9«,y- 1 .  Pi,j+ 1 1  0»  J+  I .  "-)]2  > 

where  /  is  the  relationship  between  the  surface  orientation  at  (i,j)  and  its  neighbors.  An 
example  of  the  type  of  relationship  is  the  difference  between  the  surface  orientation  of  pixel 
(i.j)  and  the  mean  value  of  the  surface  orientations  of  its  four-neighbors. 

We  have  two  constraints  that  need  to  be  satisfied  simultaneously,  —  one  from  image 
irradiance  and  one  from  surface  smoothness.  At  each  stage  of  the  iterative  process,  the  total 
residual  error  for  pixel  (»,/)  can  be  described  by 

Si.j  =  Xfcj*  +  {,/  , 

where  X  is  a  weight  ing  factor  that  can  adjust  the  influence  of  the  error  in  image  irradiance 
to  the  error  in  smoothness.4  For  the  image,  the  total  residual  error  is 

e  =  • 

U 


3The  f^rm  of  the  error  need  not  be  quadratic  —  the  goals  of  such  a  choice  include  simple  final  expressions. 

4Since  the  error  in  image  irradiance  is  not  necess3riljr  commensurate  with  that  in  surface  smoothness,  some 

form  of  normalisation  is  required. 
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The  allocation  of  surface  orientations  to  all  pixels  should  minimize  this  total  error,  that  is, 


d£ 

dpij 


=  0  , 


dqij 

Differentiating  £  with  respect  to  pij  and  also  with  respect  to  <7,j  gives  two  equations  for 
each  pixel  in  the  image.  While  complicated  forms  of  the  relationship  between  Pi,j,qi,j  and 
their  neighboring  pixels  will  generally  occur,  we  choose  our  smoothness  relation  so  that  we 
can  arrange  the  equations  in  open  form 

Pij  =  Fi(pij,qij,  and  p's  and  q's  o  f  neighboring  pixels)  , 

qtj  =  F2(pi,j,qi,j ,  and  p's  and  q's  of  neighboring  pixels)  , 

where  F\,  and  F2  are  functions. 

We  therefore  have  an  iterative  scheme  that,  given  some  initial  solution,  we  improve  by 
reducing  the  residual  error  in  image  irradiance  and  surface  smoothness.  We  need  to  ask  the 
following  questions  of  such  a  scheme:  Under  what  conditions  will  it  converge  to  a  solution? 
Is  that  solution  unique?  Does  smoothness,  as  defined  by  our  relation,  give  us  the  type  of 
surface  we  want? 


3  SURFACE  ORIENTATION 

There  are  many  equivalent  parameterizations  of  surface  orientation.  Mentioned  pre¬ 
viously  were  the  parameters  p  and  q,  the  derivatives  of  depth  with  respect  to  image  coor¬ 
dinates.  Some  authors  prefer  using  slant  and  tilt  to  specify  surface  orientation.  Slant  is  the 
angle  between  t  he  surface  normal  and  the  viewing  direction,  while  tilt  is  the  angle  between 
the  image  x  axis  and  the  projection  of  the  surface  normal  onto  the  image  plane.  Other 
parameterizations  [7)  have  been  used  when  particular  properties  of  the  parameterization 
arc  to  be  exploited.  The  parameters  wc  use  are  /  and  m:5 

l  =  sin  a  cos  r  , 
m  =  sin  a  sin  r  , 

where  n  is  the  surface  slant  and  r  its  tilt,  /  is  the  component  of  the  surface  normal  in  the 
direction  of  the  x  axis,  and  m  is  the  component  in  the  y  direction.  We  select  this  particular 
parameterization,  as  /  and  m  are  bounded 

0  <  l2  +  m2  <  1  . 

For  surfaces  that  we  can  see, 

^  JT 

°<<T  <  -  , 


r'l  and  m  are  related  to  p  and  q. 


vA 


,m 


_ -I- 
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0  <  r  <  2?r 

Consequently  /  and  m  specify  the  surface  normal  of  an  imaged  surface  without  ambiguity. 


4  FORMULATIONS  USED  FOR  SURFACE  RECOVERY 


To  explore  the  issues  of  convergence,  propagation  of  boundary  conditions,  and  the 
type  cf  surface  promoted  by  smoothness,  we  formulate  the  problem  in  two  ways:  one 
that  parallells  the  technique  previously  described  and,  alternatively,  one  that  resembles  the 
relaxation  method  used  to  solve  structural  engineering  problems. 

The  function  for  scene  radiance,  used  to  create  synthetic  images  for  the  experiment  and 
employed  by  the  shape  recovery  algorithms,  is 

1  4*  \f  1  —  —  frj2  *■■■ 

/?(/.  m)  =  0.1569( - -  — - - )  +  Maz[0.4437v  1  -  P  -  m2  +  0.3137/  +  0.3137m,  0], 

This  function  is  appropriate  for  a  scene  that  exhibits  Lambertian  reflectance  and  is  il¬ 
luminated  by  both  a  collimated  source  and  a  uniform  hemispherical  source.  This  illumina¬ 
tion  was  selected  because  it  is  typical  of  the  illumination  of  outdoor  scenes.  The  particular 
numerical  constants  specify  the  light  direction  and  intensity,  and  the  surface  albedo. 

The  first  formulation  is  similar  to  that  described  previously;  we  shall  call  this  the 
‘conventional’  formulation.  From  the  image  irradiance  equation  we  have  the  error  term 


The  smoothness  constraint  is  the  requirement  that  /,-y  be  the  average  of  its  four- 
neighbors,  and  that  m,-y  be  the  average  of  its  four-neighbors.  The  error  term  for  smoothness 
is 

c  s  II  h—l ,j  +  U+l,j  +  U,j-l  +  h,j+ 1  \2 

Zi.j  —  [U.j  ~  ) 

,  i _  +  mi+hj  +  m,,y_i  +  mt|y+i 

'Vmij  ^  ) 

Note  that  this  constraint  is  exact  for  a  surface  that  is  spherical. 

Minimizing  £  ==  +  £,-yS  by  differentiating  with  respect  to  /,-y,  and  with 

respect  to  m,,y,  and  then  setting  each  result  equal  to  zero,  we  obtain  the  expressions 


kj  = 


mu  = 


0-4(/»~i,y  +  h+t.j  +  U,j- 1  +  U,j+ 1)~ 

01  (/.-i,/-,  +  lf+ij+i  +  li-ij+i  +  l»+i,i-i)  — 

0  05(/,_2.y  +  /,+2.i  +  /»,j— 2  +  /«,;+ 2)  + 

a  d 

O.SXdij  -  Rij)^ 

0.4(m,_,  y  +  m,.+  1,y  +  ntij-i  +  ~ 

0.1(m,_ltJ_,  +  m,+  tiJ+l  +  m.-u+i  +  ml+iii_,)  - 
0.05(m,_2j  +  rn,+2j  +  m,-ry_2  +  m.j+o)  + 

/? 

0  8X(/tiJI  —  /?ij)  — — 

3m  w 
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We  use  these  (together  with  the  expression  for  R(l,m))  as  our  iterative  scheme  to  improve 
on  an  intial  solution. 

The  other  formulation  we  use,  the  ‘engineering’  formulation,  creates  error  terms  from 
the  image  irradiance  equation  and  the  smoothness  constraints,  but  does  not  combine  these 
into  one  term. 

£»,i  =  ( A‘,  j  —  Ri.j )  > 

-  (I,.,  -  '■>  +  '<* 't  t  )  , 

,  5,  ,  Wl»-1  ,j  +  d  \ 

C»,j  —  ( m«,i  ^  J 

We  view  the  as  residuals  and  apply  the  relaxation  approach,  i.e.,  reduction  of  the  largest 
residuals.  If  or  £ijSi  is  selected  for  reduction  we  choose  to  reduce  both,  as  each  is 

independent  of  the  other.  When  is  chosen  for  reduction,  we  do  the  reduction  in  two 
stages  —  one  stage  altering  /,j  and  the  other  m,j.  Of  course  we  can  scale  the  residuals, 
reduce  them  from,  say,  the  image  irradiance  equation  to  a  certain  level  before  introducing 
smoothness,  vary  the  amount  of  correction  we  apply,  (e.g.,  we  can  overrelax)  and  the  like. 
In  fact,  we  can  experiment  with  various  relaxation  approaches.  In  this  formulation  major 
changes  in  the  relaxation  scheme  generally  require  minor  programming  changes. 


5  EXPERIMENTAL  RESULTS 

The  test  image  shown  in  Figure  1  is  that  of  a  hemisphere  placed  on  a  plane,  i.e.,  a 
synthetic  image  generated  by  the  reflectance  function  previously  described.  The  collimated 
light  source  is  at  slant  *  and  tilt  *  —  which  means  that  it  is  at  the  upper  right  as  we 
view  the  image.  We  purposely  avoided  the  case  in  which  the  collimated  source  is  at  the 
same  position  as  the  viewer,  since  the  resulting  symmetric  reflectance  map  might  bias  the 
algorithm  to  return  a  symmetric  surface.  A  synthetic  image  of  a  sphere  was  selected  as 
the  test  image  because  both  the  image  irradiance  equation  and  the  smoothness  relationship 
we  use  hold  exactly.6  The  performance  of  the  algorithm  to  recover  the  surface  shape  could 
be  assessed  without  the  complications  involved  in  using  inexact  models  for  reflectance  and 
smoothness. 

We  need  initial  solutions  to  start  our  iterative/relaxation  procedures.  We  used  four  sets 
of  initial  conditions:  ( 1 )  a  plane  perpendicular  to  the  viewing  direction;  (2)  a  plane  slanted  J 
to  the  viewing  direction;  (3)  a  cone  with  its  axis  along  the  viewing  direction;  (4)  the  correct 
solution  perturbed  by  small  random  errors. 

Previous  work  has  used  boundary  conditions  to  constrain  the  recovered  surface. 
Investigating  this  approach,  we  constrained  the  surface  in  various  ways:  at  the  edge  of  the 
hemisphere,  at  a  closed  curve  lying  on  the  sphere’s  surface,  or  at  individual  points  on  the 
sphere’s  surface.  We  also  used  the  algorithms  without  any  boundary  conditions  whatsoever. 

Since  we  wished  to  investigate  the  extent  to  which  smoothness  could  propagate  bound¬ 
ary  conditions,  we  used  various  image  quantizations,  namely  16  X  16,  32  X  32,  and  64  X  64. 

eThe  smoothness  relationship  does  not  hold  at  the  edge  of  the  hemisphere  where  it  joins  the  plane. 
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The  findings  can  be  characterized  as  follows: 

•  Both  techniques  —  the  engineering  and  the  conventional  method  —  gave  essentially  the 

same  results. 

•  The  engineering  technique  converged  much  faster  than  the  conventional  technique. 

•  Smoothness  propagates  boundary  conditions  by  no  more  than  a  few  pixels 

•  The  initial  solution  largely  predetermines  the  final  one. 

Figures  3-11  display  examples  of  the  results  we  achieved  with  the  conventional  iterative 
scheme;  the  engineering  scheme  gave  essentially  the  same  results.  In  each  of  these  figures 
the  top  left  picture  shows  the  profile  of  the  recovered  surface  (viewed  from  the  bottom 
left  corner)  ,  while  at  the  top  right  we  find  an  image  that  is  the  sine  of  the  surface  slant, 
with  black  representing  0  and  white  1.  The  bottom  left  is  the  cosine  and  the  bottom  right 
the  sine  of  the  surface  tilt,  with  black  representing  -1,  gray  0,  and  white  +1.  The  results 
are  presented  in  this  manner  so  that  the  performance  of  the  algorithms  can  be  evaluated. 
The  profile  can  on  occasion  appear  more  accurate  than  the  individual  surface  orientations 
(as  might  be  expected  of  an  integration  procedure);  at  other  times,  however,  errors  in  the 
surface  orient  ation  (sometimes  just  from  the  image  quantization)  of  highly  slanted  surfaces 
cause  the  integration  routine  that  produces  the  surface  shape  for  profiling  to  overstate  the 
error.  Figure  2  shows  the  results  that  should  be  obtained  if  the  shape  recovery  algorithms 
recovered  the  surface  exactly. 

Figures  3-6  illustrate  the  effects  of  various  boundary  conditions.  The  errors  at  the  edge 
of  the  sphere  where  it  joins  the  plane  are  expected,  as  smoothness  does  not  hold  there.  Each 
figure  is  the  result  of  320  iterations,  this  being  five  times  the  linear  dimension  of  the  picture 
used.  The  boundary  condition  at  a  point  affects  an  area  of  approximately  10  pixels  in  radius. 
Only  for  Figure  6,  where  a  random  five  percent  of  pixels  were  set  to  their  correct  values,  is 
the  surface  shape  recovered  correctly.  Smoothness  as  a  propagator  affects  but  a  small  area. 
Figures  4,  7,  and  8  illustrate  this  point  further.  Here  various  image  sizes  are  used.  Observe 
that,  as  the  image  size  increases,  the  boundary  conditions  diminish  in  their  effect  and  the 
solution  becomes  progressively  worse.  Figures  4,  9,  and  10  reveal  the  dominant  influence  of 
the  initial  solution.  Figure  11  is  included  to  show  the  effect  of  smoothness  when  X  =  0  — 
namely,  when  image  irradiance  does  not  affect  the  solution  at  all.  This  figure,  obtained  after 
320  iterations,  demonstrates  what  smoothness  alone  can  achieve,  even  when  the  definition 
of  smoothucss  is  exact  for  the  viewed  scene  (a  sphere). 

Smoothness  is  a  poor  selector  of  surface  shape  and  a  poor  propagator  of  boundary 
information  when  it  is  used  to  tie  the  surface  orientation  of  a  particular  surface  point  to  those 
of  its  neighbors.  Generally,  in  engineering,  problems  solved  with  relaxation  techniques  are 
formulations  that  relate  a  given  property  at  one  point  to  that  same  property  at  neighboring 
points  by  means  of  differential  relations.  It  is  the  derivative  that  propagates  boundary 
information  and  selects  a  particular  solution  to  be  recovered.  We  present  such  a  formulation 
below  in  an  attempt  to  relieve  smoothness  of  its  role  as  propagator  and  selector. 

6  SURFACE  RADIANCE  AND  ISOTROPIC  SCATTERING 

Our  formulation  of  the  relationship  between  image  irradiance  and  scene  radiance  is 
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I(x,  y)  =  /?(/,  m)  , 

where  Hx,y)  is  the  image  irradiance  at  image  point  x,y  and  R(l,m)  is  the  scene  radiance 
for  a  surface  normal  we  represent  by  l,m.  R  is  a  function  of  the  components  of  the 
surface  normal  and  they,  in  turn,  are  functions  of  image  coordinates.  R(l,m)  specifies 
the  relationship  between  surface  radiance  and  surface  orientation,  while  l(x,y)  and  m{x,y) 
specify  the  relationship  between  surface  orientation  and  image  coordinates.  /?(/,  m)  embodies 
knowledge  of  the  nature  of  surface  reflection,  while  l(x,y)  and  m(x,  y)  embody  the  surface 
shape. 

To  provide  the  additional  constraints  we  need  for  relating  surface  orientation  to  image 
irradiance,  we  introduce  constraints  that  relate  properties  of  R(l,m),  —  that  is,  constraints 
that  specify  the  relationship  between  surface  radiance  and  surface  orientation.  Such  con¬ 
straints  are 

(1  -  l2)Ru  =  (1  -  rn2)Rmm  , 

(Ru-Rmm)lm  =  (t*-m2)Rlm  , 

where  Ru  is  the  second  partial  derivative  of  R  with  respect  to  /,  Rmm  is  the  second  partial 
derivative  of  R  with  respect  to  m,  and  Rtm  is  the  second  partial  cross-derivative  of  R  with 
respect  to  /  and  m. 

These  two  partial  differential  equations  embody  the  assumption  of  isotropic  scattering 
(Lambertian  reflectance).  For  isotropic  scattering  R(l,m)  has  the  form 

R(l,m)  =  al  +  bm  +  c\f\~—  P  —  m2  +  d  , 

where  a,A,r,  and  d  are  constants,  their  values  depending  on  illumination  conditions  and 
surface  albedo.  Note  that  l,m,  and  yj  1  —l2  —  rrfi  are  the  components  of  the  unit  surface 
normal  in  the  directions  x,y,  and  depth.  R(l,m )  can  be  viewed  as  the  dot  product  of 
the  surface  normal  vector  (l.m,  v/l  —  l2  —  m2)  and  a  vector  (a,b,c)  denoting  illumination 
conditions.  As  the  value  of  a  dot  product  is  rotationally  independent  of  the  coordinate 
system,  the  scene  radiance  is  independent  of  the  viewing  direction  —  which  is  the  definition 
of  isotropic  scattering. 

It  is  easily  seen  that  R(t,  m)  =  al  +  bm  +  ey/l  —  P  —  m2  -f  d  satisfies  the  pair  of  partial 
differential  equations  given  above.  In  the  appendix  we  show  that  R(l,m)  =  al  +  bm  + 
cx/1  —  i~  —  m2  +  d  is  the  solution  of  the  pair  of  partial  differential  equations.  These  partial 
differential  equations  are  an  alternative  definition  of  isotropic  scattering. 

It  is  worthy  of  note  that  R(l,m)  =  al  +  bm  +  cy/l  —  P  —  m2  +  d  includes  radiance 
functions  for  multiple  and  extended  illumination  sources,  including  that  for  a  hemispherical 
uniform  source  such  as  the  sky.  The  assumption  of  isotropic  scattering  restricts  the  class 
of  material  surfaces  being  considered,  not  the  illumination  conditions. 

7  EQUATIONS  RELATING  SURFACE  ORIENTATION  TO  IMAGE 
IRRADIANCE 

Differentiating 

I(x,y)  =  R(l,m) 
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with  respect  to  x  and  y,  we  obtain 


Ix  —  Rl^x  +  Rmmx  > 


Iy  =  Rl  ly  RmTTly  , 

Ixx  =  Rll^x  "F  Rmmmx  "F  ^Rtm^x^x  "F  Rt^xx  "F  Rm^xx  t 

Iyy  —  Rltly  +  Rmm*Tly  +  2R[mly1Tly  +  Z?(/yy  +  RmTTlyy  , 

Ixy  ==  Rll^x^y  "F  Rinm^x  Wy  “F  Rtmi^x^y  "F  ^y^i)  “F  Rrfxy  "f"  Rm^xy  > 

where  subscripted  variables  denote  partial  differentation  with  respect  to  the  subscript(s). 
From  the  constraints  for  isotropic  scattering,  we  derive  the  relationships 

1  —  m2 

Rtt  —  Rim  > 

_  l-/2 

Rmm  —  7  ftm 

Im 

Substituting  these  relationships  for  Ru  and  Rmm  in  the  expressions  for  /2I)/yy,and  Ixy, 
we  obtain 


l'*2(  ~~Im~ ]  +  mz2{hr^)  +  “  7«  -  R‘1**  ~  R"m** 

2,  1  ~  Wl2.  •>.  1  —  f2^  .. 


['v  (  "  )  -F  my  (  im  )  +  2lymy]Rtm  <=»  /yy  —  Rilyy  —  RmrnVy  , 

J  _  ^2  |  |2 

[^*^y(  )  "F  my(  ^  )  +  f*Wly  +  fyrn2]/?(m  =  /*y  —  Rllxy  Rmfflxy 


/rf|  ;  -r  J  t  •*"*»  •  =  i*y  -  /qi*y  ~  «mrn,y  . 

By  removing  and  substituting  the  expressions  for  /?j  and  i?m,  defined  by  the 
expressions  for  /*  and  /y,  we  produce  two  partial  differential  equations  relating  surface 
orientation  to  image  irradiance: 

~  t»7^*y  ~  fonity  =  -  x7/*y  , 

O 0 /y y  4*  ft OfTlyy  “  Cx6  Ijy  b  TTlxy  X^^Xy  » 

where 

O  I X tlly  IyfTlx  , 

d  =  Iylx  ~  Ixly  y 

7  =  f»2(l  -  m2)  +  m,2(l  - /2)  + , 
i  =»  fy2(l  —  m2)  +  my2(l  —  Z2)  +  2lymvlm  , 

9  =  My(l  -  m2)  +  mzrriy(l  -  Z2)  +  (lzmy  +  lymx)lm  , 

X  =  Z*rriy  lymx 

These  equations  relate  surface  orientation  to  image  irradiance  by  parameter-free  ex¬ 
pressions.  They  involve  the  derivatives  of  image  irradiance,  but  not  the  image  irradiance 
itself  —  an  important  feature  if  we  conjecture  such  a  model  for  the  human  visual  system. 
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8  RECOVERY  OF  SURFACE  SHAPE  —  A  SPECIAL  CASE: 

A  SPHERICAL  SURFACE 

It  is  difficult  to  solve  the  equations  relating  surface  orientation  to  image  irradiance,  and 
thus  to  recover  surface  shape  from  observed  image  irradiance.  Two  types  of  approaches  are 
possible.  The  two  differential  equations  can  be  integrated  in  a  step-by-step  manner  or,  given 
some  initial  solution,  a  relaxation  procedure  may  be  employed.  The  difficulties  that  arise 
are  two-fold,  numerical  errors  and  multiple  solutions. 

Solutions  of  the  equation  \  =  0  (the  developable  surfaces,  e.g.,  a  cylinder)  are  also 
solutions  of  the  equations  relating  surface  orientation  to  image  irradiance.  If  the  images 
intensities  were  known  in  analytic  form  then  analytic  solution  of  the  equations  could  employ 
boundary  conditions  to  select  the  appropiate  solution.  However  since  the  analytic  form 
for  the  image  intensities  is  unknown,  numerical  procedures  must  be  employed.  Numerical 
procedures  to  integrate  the  equations  inevidently  introduces  small  errors.  Instability  of  the 
numerical  scheme  seems  responsible  for  such  errors  eventually  dominating  the  recovered 
solution. 

The  alternative,  a  relaxation  procedure  to  solve  the  equations,  has  its  own  difficulties. 
The  difficulties  experienced  in  the  shape-from-shading  methods  discussed  in  the  first  part 
of  this  paper  dictate  caution.  The  importance  of  a  good  initial  solution  for  a  relaxation 
method  cannot  be  overemphasized.  Simplfying  the  two  partial  differential  equations  (using 
additional  assumptions)  provides  a  method  for  obtaining  an  good  initial  solution. 

The  spherical  approximation  assumes  that  we  are  on  a  spherical  surface.  This  implies 
ly  =  0,  ttij  —  0 ,lz  —  rny,  —  namely,  constant  curvature  independent  of  direction.  For  this 
case  the  partial  differential  equations  become  relationships  between  the  second  derivatives 
of  image  irradiance  and  the  components  of  the  surface  normal: 

1  ~  m2  _  [zz 
ITU  I 2 iy 

l-P  _  Iyy 
lm  /jy 

These  results  for  the  spherical  approximation  are  equivalent  to  those  Pentland  was  able 
to  obtain  [8]  through  local  analysis  of  the  surface.  In  addition  to  providing  a  mechanism 
for  obtaining  an  initial  solution  for  a  relaxation-style  algorithm,  their  direct  application 
estimates  the  surface  orientation  by  local  computation  [8j. 

We  are  actively  engaged  in  the  development  of  a  relaxation  procedure  to  transform  the 
initial  solution  (given  by  the  spherical  approximation)  into  a  solution  the  satifies  the  full 
equations. 


9  THE  INFLUENCE  OF  BELIEF  ON  THE  PERFORMANCE  OF 
A  VISUAL  SYSTEM 

The  constraints  derived  for  isotropic  scattering  do  not  have  to  be  true  embodiments  of 
the  physical  iaws  of  nature;  rather,  they  can  represent  the  beliefs  a  visual  system  possesses 
regarding  those  laws.  In  circumstances  in  which  such  beliefs  do  not  hold,  the  visual  system 
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will  err  in  predicting  the  world’s  true  nature.  Of  course,  if  the  model  is  not  a  good 
approximation  of  the  physical  laws  of  nature,  the  visual  system  embodying  it  is  useless. 
The  two  constraints  specifying  isotropic  scattering, 

(1  —  l2)Ru  =  (1  —  m2)Rmm  , 

(Ru  ~  Rmm)lm  =  (l2  -  m2)Rlm  , 

obviously  both  hold  when  the  scattering  is  isotropic,  but  what  is  the  situation  for  other 
forms  of  scattering? 

The  images  produced  by  a  scanning  electron  microscope  constitute  an  intriguing  situa¬ 
tion.  The  appropiate  expression  for  scene  radiance  [7]  is 


/?(/,  m)  =  a(l  + 


rl-P-m2 


where  a  is  a  constant.  This  expression  is  quite  unlike  those  for  natural  scenery,  yet  the 
human  visual  system  ‘sees’  an  image.  Note  that  the  second  constraint  for  the  isotropic 
scattering  case  is  satisfied  by  this  radiance  function,  but  not  the  first.  The  second  constraint 
is  about  surface  tilt,  as  where  r  is  the  surface  tilt;  the  first  constraint 

introduces  slant.  In  using  the  equations  relating  surface  orientation  to  image  irradiance  to 
recover  surface  orientation,  one  might  expect  them  to  predict  tilt  correctly  for  surfaces  in 
electron  microscope  images,  but  to  err  in  predicting  slant. 

For  other  forms  of  the  scene  radiance  expressions,  neither  constraint  holds.  Specular 
reflectance  has  been  approximated  [2]  by 


R(l.  rn)  —  a[6(l  —  l2  —  m2)  +  cl\/l  —  P  —  m2  +  dm\/l  —  P  —  m2]” 


where  n  is  a  constant,  usually  having  a  value  between  1  and  10  that  determines  the 
‘sharpness’  of  the  specular  peak. 

For  the  maria  of  the  moon,  the  form  of  scene  radiance  normally  used  [2]  is 


R(l,m) 


o(6f  +  cm) 


The  constants  a,  6,  c,  and  d  it  the  above  expressions  are  associated  with  the  strength  and 
position  of  the  light  source,  as  well  as  with  the  surface  albedo. 

The  constraints  do  not  hold  in  either  of  the  preceding  cases.  We  would  expect  a  visual 
system  embodying  them  to  make  errors  under  these  circumstances.  Nevertheless  this  should 
not  induce  us  to  immediately  begin  searching  for  new  visual  beliefs.  After  all  the  human 
visual  system  is  imperfect  under  conditions  of  specular  reflection;  moreover,  people  observed 
the  moon  throughout  history  without  concluding  that  it  was  spherical. 

If  these  constraints  are  incorporated  in  the  human  visual  system,  the  predictions  based 
on  them  —  i.e.,  when  the  visual  system  will  return  ostensibly  ‘correct’  and  ‘incorrect’ 
information  —  could  be  tested  by  psychophysical  experiments.  Such  predictions  together 
with  their  verification  or  refutation  are  being  investigated. 
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10  CONCLUSION 


The  shape-from-shading  task  (recovering  surface  orientation  from  image  irradiance), 
has  meant  finding  a  solution  to  the  image  irradiance  equation.  This  formulation  requires 
that  the  characteristics  of  the  scene  illumination  and  the  surface  material  be  known.  While 
these  requirements  are  difficult  to  satisify, knowing  them  makes  it  possible  to  apply  the  image 
irradiance  equation  to  any  scene  material  for  which  the  scene  radiance  function  is  known. 
Such  application,  however,  is  not  without  difficulty,  appropiate  boundary  conditions  are 
needed  and  the  effect  of  image  noise  is  uncertain. 

To  recover  surface  orientation,  relaxation-style  algorithms  based  on  the  image  irradiance 
equation  employ  additional  constraints.  These  constraints,  which  are  needed  to  supplement 
the  underdetermined  image  irradiance  equation,  capture  the  concept  of  smoothness.  While 
smoothness  superficially  determines  the  relationship  between  image  irradiance  and  surface 
orientation,  it  is  too  weak  a  concept  to  propagate  bound, ary  conditions  and  thus  equally 
ineffectual  as  a  means  of  recovering  the  required  solution. 

In  presenting  a  new  formulation  for  the  shape-from-shading  task,  we  have  traded  the 
need  to  know  the  explicit  form  of  the  scene  radiance  function  for  the  assumption  that 
material  scatters  light  isotropically.  This  model  is  applicable  to  natural  scenery  without 
additional  assumptions  about  illumination  conditions  or  the  albedo  of  the  surface  material. 
The  model  also  demonstrates  some  competence  even  when  the  scattering  is  not  isotropic. 
Such  a  model  poses  the  question:  does  the  human  visual  system  embody  a  particular  belief 
about  the  laws  of  scattering  that  it  applies  even  when  these  laws  are  inexact? 

Effective  numerical  procedures  based  on  this  new  formulation  of  the  shape-from-  shad¬ 
ing  task  remain  unknown  and,  are  subjects  for  further  development. 
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APPENDIX 


We  show  that,  the  solution  of  the  system  of  partial  differential  equations, 


(1  —  l2)Ru  =  (1  —  rn2)Rmm  , 

(Rlt-Rmm)lm  =  (l2-m2)Rlm  , 

where  Ru  is  the  second  partial  derivative  of  R  with  respect  to  /,  Rmm  is  the  second  partial 
derivative  of  R  with  respect  to  m,  and  Rim  is  the  second  partial  cross  derivative  of  R  with 
respect  to  /  and  m,  is 

R(l,m)  =  at  +  bm  +  e\/l  —  P  —  m2  +  d  , 
where  a,b,c,  and  d  are  constants. 

Proof:  Rearranging 


(1  -  /=)/?«  =(l-m2)/?, 
(Ru  -  Rmm )lm  =  (l~  -  ms)Rt 


mm  * 

m  f 


we  obtain 


Im 

Rim  =  Z  9  **ll  i 

1  —  m* 

n  /m  n 

“Im  ===  j  J2*'mm 


Integrating  Rtm  =  with  respect  to  /,  that  is, 


/  /  IRudl  , 


gives 


tfm  = 


m 


1  -  m2 


(IR1-R)+Fl(m)  , 


where  f’i(m)  is  an  arbitary  function  of  m.  Similarly,  integrating  J?/m 
respect  to  m  gives 

R,=  l~(mRm-R)  +  F2(l)  , 


. _  _{m_ 


Rmm  with 


j_|2  “mm 
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where  F3(/)  is  an  arbitary  function  of  /.  Rearranging  these  two  equations,  we  get  a  system 
of  two  first  order  partial  differential  equations 

(1  — /2  — +  IR  =  lmF3(m)  +  (1  —  m2)F4(f)  , 

( 1  -  /2  -  m2  )Rm  +  mR  =  lmF<(l)  +(1  -/2)F3(m)  , 
where  F3(m)  —  (1  —  mi)Fy(m ),  and  F4(/)  =  (1  —  l2)Fi(l).  Multiplying  both  equations  by 
the  integrating  factor  (1  —  l2  —  m2)~*,  we  obtain 

|,[(1  -  /2  -  m2)-^/?]  =  (i  -  l2  -  m2)-i[fmF3(m)  +  (1  -  m2)F4(/)]  , 

Oi 

-  /2  -  m2r  *F]  =  (1  -  /2  -  m2r§[/mF4(0  +  (1  -  /2)F3(m)]  . 

am 

[before  carrying  out  the  integration,  we  can  find  the  form  of  F3(m),  and  F4(/)  by 
differentiating  the  first  equation  with  respect  to  m  and  the  second  with  respect  to  /: 

■;  1(1  -  /2  -  m2  )"*/?]  =  (1  -  /2  -  m2)-S(/(l  -  f2  +  2m2)F3(m)  +  m(l  -  m2  +  *2/2)F4(f) 

alum 

+  lm(l  —  I2  —  m2)F'3(m)\  , 

.  ?*-  [( 1  -  /2  -  m2)-  *F]  =  (1  -  /2  -  m2)“5  [/( 1  -  l2  +  2m2)F3(m)  +  m(l  -  m2  +  2 /2)F4(/) 
Oldm 

+  /m(l  —  Z2  —  m2)F/4(/)]  , 

where  F'(fc)  indicates  differentiation  with  respect  to  the  independent  variable  k.  Hence, 

F3(m)  =  F4(/)  . 

F3( rn)  is  a  function  of  m  and  F4(/)  is  a  function  of  /;  this  implies  that 

F3(m)  =  d  , 

F'<(1)  =  d  , 

where  d  is  a  constant.  Therefore, 

F3(m)  =  dm  +  6  , 

F4(f)  =  df  +a  , 

where  a,  and  b  are  constants.  Returning  to  the  integration  step,  we  now  have  the  expressions 

f.[0  -  l2  -  m2)~hi]  =  (1  —  l2  —  m2)~*[l(bm  +  d)  +  o(l  -  m2)]  , 

al 

/-[(I -l2-m2)-^R]  =  (1  -/2-m2)-»[m(a/  +  rf)  +6(1-/2)]  . 

am 

Integrating  the  first  equation  with  respect  to  /  and  the  second  with  respect  to  m,  we  obtain 
( I  —  /2  —  rn2)-  *  R  =  (bm  +  d)(  1  —l2  —  rr2)~  *  +  al(l  —  l2  —  m2)~  *  +  F$(m)  , 

(l  —  l2  —  rn2 )~  ^  R  =  (a/  +  </)(  1  —  I2  —  m2)~^  +  6m(  1  —  l2  —  m2)~  i  +  Fe(/)  , 
where  F^(m),  and  Fe(/)  are  arbitary  functions  of  m  and  /,  respectively.  We  have  two 
expressions  for  R: 

R  =al  +  6m  +  F$(m)(  I  —l2  —  m2)*  +  d  , 

R  —al  +  6rn  +  Ffl(/)( 1  -  Z2  -  m2 )*  +  d  , 

which  are  compatible  if 

F5(tti)  =  Fe(i)  =  c  , 

where  c  is  a  constant.  The  solution  for  R  is 

R  =  al  +  bm  +  c  \f\  —  P  —  rn 2  +  d 
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Figure  1  Original  Image. 

Figure  S  ‘Ideal’  Result.  Top  left  -  profile  of  recorered  surface;  top 
right  ■  sine  slant.  blarksO,  whitesl;  bottom  left  •  cosine  lilt,  blacka-1. 


graysO,  whites  +  1;  bottom  right  -  sine  tilt. 


15 


Figure  S.  No  boundary  conditions;  planar  initial  solution  perpen¬ 
dicular  to  viewing  direction;  image  quantisation  64  X  84 
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Figure  t.  Boundary  condition  curve  on  sphere’s  surface  (square 
shape);  planar  initial  solution  perpendicular  to  viewing  direction;  image 
quantization  64  X  64. 


Figure  •.  Random  five  percent  of  points  fixed;  planar  initial  solution 
perpendicular  to  viewing  direction;  image  quantization  64  X  64. 
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Figure  8.  Boundary  at  edge  of  sphere  given;  planar  initial  solution 
perpendicular  to  viewing  direction;  image  quantisation  18  X  18. 
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Figure  11.  Smoothness  constraint  only.  Boundary  at  edge  of  sphere 
given;  planar  initial  solution  perpendicular  to  viewing  direction;  image 
quantisation  64  X  64. 
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Appendix  C 

DETECTION  OF  THE  VISIBLE  SKYLINE  IN  URBAN  SCENES  AND  NATURAL  TERRAIN 


Michael  R.  Lowry 


I  INTRODUCTION 


Although  not  always  a  well  defined  problem,  delineation  of  the  sky- 
land  boundary  provides  important  constraining  information  for  futher 
analysis  of  the  image.  Its  very  existence  in  an  image  tells  us 
something  about  the  location  of  the  camera  relative  to  the  scene  (i.e. 
that  the  scene  is  being  viewed  at  a  high-oblique  angle),  allows  us  to 
estimate  visibility,  i.e.,  how  far  we  can  see  (both  as  a  function  of 
atmospheric  viewing  conditions,  and  as  a  function  of  the  scene  content), 
provides  a  source  of  good  landmarks  for  (autonomous)  navigation,  and 
defines  the  boundary  beyond  which  the  image  no  longer  depicts  portions 
of  the  scene  having  fixed  geometric  structure. 

The  objective  of  this  project  was  to  identify  the  middle-to-f ar 
skyline  in  urban  scenes  and  natural  terrain.  These  two  domains  span 
most  of  the  problems  encountered  in  skyline  delineation.  The  principal 
difficulty  is  in  distinguishing  bright  clouds  from  bright  objects  on  the 
skyline  and  dark  clouds  from  textureless  dark  objects  on  the  skyline. 
Since  one  of  the  objectives  of  identifying  the  skyline  is  to  constrain 
further  semantic  interpretation,  the  use  of  detailed  semantic  models  was 
excluded  in  this  project.  Rather  the  skyline  was  identified  using 
physical  models  and  assumptions  valid  for  any  scene. 

One  possible  use  of  skyline  detection  is  in  landmark 
identification.  Landmark  identification  has  potential  use  in  passive 
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autonomous  navigation  and  in  cross  reference  and  compilation  between 
maps  and  photographs.  Another  use  is  to  provide  a  semantic  constraint 
on  further  image  interpretation.  The  skyline  detection  algorithm 
described  in  this  report  is  conservative.  For  images  without  confusing 
factors  the  skyline  is  identified  unambiguously,  while  in  difficult 
images  the  picture  might  be  broken  up  into  a  sky  region,  a  land  region, 
and  a  region  of  amibguity.  Even  in  these  difficult  images  the  algorithm 
serves  to  constrain  further  image  interpretation. 

Section  2  of  this  report  discusses  physical  models  of  sky,  cloud, 
and  land  luminance  and  previous  work  on  skyline  delineation.  Section  3 
documents  the  algorithm  developed  for  skyline  delineation:  at  the  core 
is  a  region  segmentation  technique  well  matched  to  the  physical 
constraints  of  the  problem.  Results  are  given  in  Section  4,  and  the 
factors  influencing  the  robustness  of  the  algorithm  are  identified. 
Section  5  presents  our  conclusions. 


II  PHYSICAL  MODELS  OF  SKY,  CLOUD,  AND  LAND  LUMINANCE 

Different  physical  mechanisms  give  rise  to  the  luminance  of  sky, 
clouds,  and  land  surfaces.  Sky  luminance  is  the  result  of  sunlight 
scattered  by  the  atmosphere.  Cloud  luminance  is  either  sunlight 
attenuated  b'y  transmission  through  a  relatively  thin  cloud  or  sunlight 
reflected  from  a  thick  cloud.  Land  and  water  surfaces  reflect  sunlight 
and  amlj  ient  1  ight  . 
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A.  Cloud  Free  Sky  Luminance^-Basic  Mechanisms 

Atmospheric  scattering  of  sunlight  occurs  through  a  combination  of 
molecular  scattering  and  aerosol  scattering.  Molecular  scattering,  also 
termed  Rayleigh  scattering,  occurs  when  the  gaseous  molecules  of  the 
atmosphere  interact  as  dipoles  with  incident  light.  The  scattering  is 
proportional  to  the  fourth  power  of  the  frequency  of  the  incident  light. 
It  is  the  higher  scattering  of  shorter  wavelenghts  which  gives  the  sky 
its  characteristically  blue  color,  and  the  sun  its  complementary 
yellowish  tinge.  An  equal  amount  of  light  is  scattered  forward  and 
backwards  along  the  direction  of  the  incident  light.  This  implies  that 
the  sky  is  equally  bright  sunside  and  antisunside. 

In  contrast  to  molecular  scattering,  aerosol  scattering  is 
wavelength  independent.  Aerosol  scattering,  also  termed  Mie  scattering 
or  large  particle  scattering,  arises  from  internal  refractions  and 
reflection  of  light  that  has  entered  a  large  particle.  Since  aerosols 
scatter  all  wavelenghts  equally,  the  scattered  light  is  white.  Thus  as 
the  sky  gets  hazier,  it  appears  whiter.  In  aerosol  scattering  there  is 
a  much  larger  forward  scattering  component  than  backward  scattering 
component.  As  a  result,  on  a  hazy  dey  the  sky  is  very  bright  sunside 
but  dark  antisunside. 

Another  key  concept  in  understanding  sky  luminance  is  the  length  of 
the  optical  path  through  the  scattering  medium  from  an  observer  to  the 
effective  edge  of  the  atmosphere.  Since  the  density  of  air  decreases  by 
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a  factor  of  10  for  every  15  kilometers  of  altitude,  the  scattering 
properties  of  the  atmosphere  effectively  terminate  at  30-60  kilometers 
of  altitude.  As  an  observer  moves  his  line  of  sight  from  the  vertical 
towards  the  horizon  the  optical  path  increases  dramatically.  Since  the 
optical  path  passes  through  proportionally  more  scattering  medium,  the 
sky  usually  appears  brighter  towards  the  horizon.  This  effect  is 
enhanced  because  the  atmospheric  scattering  mediun  near  the  earth’s 
surface  is  also  being  illuminated  by  light  reflected  from  the  land.  The 
exception  to  this  limb  effect  is  when  the  haze  is  so  dense  that  the 
increase  in  scattering  mass  is  offset  by  the  attenuation  of  the  light 
passing  through  the  dense  aerosol. 

The  relevant  parameters  for  cloud-free  sky  luminance  are  the 

position  of  the  sun  (azimuth  and  elevation),  the  aerosol  content  of  the 
atmosphere,  the  luminosity  of  the  land,  and  the  line  of  sight. 
Quantitative  predictions  for  cloud-free  sky  luminance  were  derived 

through  computer  simulation  by  J.V.  Dave  at  IBM  Palo  Alto  [Dave  80] . 
Five  different  atmospheric  models  were  used,  including  aerosol  free, 
aerosol  content  typical  over  a  land  mass,  and  aerosol  content  typical 
over  a  water  mass.  Predictions  from  this  study  include  the  following: 

(1)  The  luminosity  of  the  sky  varies  gradually  and 

continuously.  For  an  aerosol-free  sky  the  total 
variation  is  about  10:1.  For  an  average  amount  of 

aerosols  there  is  a  strong  increase  in  brightness  towards 
the  sun  and  high  noon. 

(2)  The  clear  sky  is  blue,  but  becomes  whiter  with  greater 
aerosol  content  in  the  atmosphere. 
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(3)  An  aerosol-free  sky  is  brighter  and  whiter  towards  the 
horizon  because  of  the  limb  effect.  This  enhances  the 

contrast  at  the  horizon.  However  the  horizon  contrast  is 
attenuated  under  hazy  conditions,  particularly  for  the 
distant  skyline.  The  horizon  contrast  is  a  function  of 
the  scattering  versus  the  attenuation  properties  of  the 
atmosphere . 

B.  Qloud  huminance-^Cumu lus  and  Stratus 

The  principal  difficulty  in  skyline  delineation  is  distinguishing 
clouds  from  land  features.  Since  clouds  form  a  much  more  limited  class 
than  land  features,  the  approach  taken  in  this  project  has  been  to  use 
qualitative  models  of  clouds.  Good  quantitative  models  of  cloud 
luminosity  do  not  exist.  Qualitative  predictions  can  be  made  from  basic 
theoretical  t  'siderations  and  empirical  observations.  The  basic 
physical  mechanism  governing  cloud  luminosity  is  aerosol  scattering  of 
incident  sunlight.  For  a  thin  cloud,  secondary  scattering  can  be 
ignored,  so  the  basic  mechanism  is  attenuation  of  sunlight  passing 
through  the  cloud.  This  attenuation  obeys  Bier’s  law  of  transmission — a 
decaying  exponential  function  of  the  optical  density  of  the  cloud.  For 
the  range  of  optical  densities  for  which  secondary  scattering  can  be 
ignored,  the  exponential  is  approximately  linear;  thus  the  attenuation 
is  approximately  a  linear  function  of  the  thickness  of  the  cloud.  For  a 
thick  cloud,  the  effects  of  secondary  scattering  predominate  so  the 
cloud  is  essentially  a  lambertian  surface  with  80%  reflectivity  (hence 
20%  transmission — there  is  no  adsorption  "  light  in  the  visible  range 
by  water  particles).  As  sunlight  passes  deeper  into  a  cloud, 


successively  more  and  more  of  the  light  is  scattered  in  all  directions. 
At  some  point  all  the  light  is  scattered,  and  the  illumination  is 
isotropic.  Thus  20%  of  the  original  illumination  is  the  equilibrium 
value  for  isotropic  scattering.  This  lack  of  directionality  in 
illumination  occurs  in  dense  ground  fogs  and  when  flying  through  clouds. 
Since  the  sunlit  portion  of  a  cumulus  cloud  acts  as  a  lambertian 
surface,  its  shape  can  be  determined  from  its  shading  using  the  same 
shape-f rom-shading  algorithms  used  for  other  lambertian  surfaces.  In 
contrast,  those  cloud  surfaces  whose  luminance  is  due  to  transmitted 
light  do  not  obey  the  normal  surface  shape-to-shading  relationship. 

Thick  clouds  with  large  vertical  extent  are  cumulus  clouds  created 
by  local  weather  conditions.  Local  updrafts  carry  water  vapor  into  the 
upper  atmosphere  where  it  condenses  into  water  droplets.  A  local 
downdraft  is  created  by  the  updraft,  providing  a  natural  separation 
between  cumulus  clouds.  For  a  cumulus  cloud,  the  top  is  bright  and 
reflective,  the  sides  not  in  shadow  are  similarily  bright,  and  the 
bottom  is  dark.  Because  of  the  separation  between  clouds,  it  is  usually 
possible  to  see  some  bright  reflective  surface.  The  term  "cloudy  skies" 
usually  refers  to  the  prescence  of  cumulus  clouds. 

Stratus  clouds  are  created  by  larger  weather  systems,  such  as  low- 
pressure  systems.  They  have  large  horizontal  extent  and  comparatively 
small  vertical  extent.  Consequently  from  the  ground  the  line  of  sight 
is  only  to  the  bott  3  of  the  cloud,  and  the  luminance  is  solely  from 
light  transmitted  through  the  cloud.  Overcast  skies  are  due  to  stratus 
clouds . 
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c. 


Other  Atmospheric  E?fi®2t;i2Dj.  P2lilTi?§kion 


The  oblique  angle  of  the  sun  relative  to  the  atmosphere  at  dawn  and 
dusk  causes  refraction  of  incident  sunlight  according  to  Snell’s  laws. 
The  result  is  a  temporal  separation  of  colors  as  the  sun  moves  towards 
the  horizon.  Spectacular  atmospheric  effects  can  also  be  caused  by 
alignment  of  a  large  group  of  refracting  or  reflecting  particles.  For 
example,  rainbows  are  caused  by  billions  of  raindrops  acting  as 
identical  prisms  aligned  through  gravity  and  the  effects  of  aerodynamic 
drag.  Similarly,  ice  crystals  aligned  in  a  high  cloud  can  act  as  a 
gigantic  mirror  when  viewed  from  an  airplane.  The  effects  of  refraction 
and  reflection  were  not  incorporated  into  the  implicit  assumptions  made 
by  the  skyline  delineation  algorithm. 

Polarization  of  the  clear  sky  is  due  to  the  interaction  of 
molecular  dipoles  with  incident  sunlight.  The  sun’s  radiation  can  be 
modeled  as  plane  wave  oriented  perpendicular  to  the  direction  of 
propagation.  Within  the  plane  there  is  no  preferred  direction  of 
polarization,  this  isotropy  can  be  represented  as  a  circle  in  the  plane. 
Incident  sunlight  causes  a  molecule  to  vibrate  in  sympathy;  the  light 
emitted  when  the  molecule  vibrates  is  the  scattered  light.  The 
vibration  of  the  molecule  can  be  modeled  as  a  circular  motion  in  the 
plane  perpendicular  to  the  sun’s  rays,  since  the  incident  sunlight  has 
no  preferred  polarization.  When  this  motion  is  viewed  by  an  observer 
not  directly  in  line  with  the  sun  and  the  molecule,  the  motion  appears 
to  be  tracing  out  an  ellipse.  The  difference  between  the  major  and 


minor  axis  of  the  ellipse  is  the  degree  of  polarization.  Along  any 
given  line  of  sight,  the  polarization  of  the  molecular  vibration  is  the 
same,  since  the  sun  is  effecitvely  a  source  of  parallel  light.  In  the 
presence  of  clouds  or  haze,  polarization  is  virtually  absent  since 
aerosol  scattering  occurs  through  refraction  and  reflection  as  opposed 
to  the  vibration  of  a  molecular  dipole.  Since  it  is  under  conditions  of 
cloud  and  haze  that  skyline  delineation  becomes  difficult,  polarization 
is  not  a  disambiguating  feature. 


D.  Use  of  Color  and  a  World  Model  in  the  Recognition  of  Natural 
Outdoor  Scenes 


The  doctoral  dissertation  of  Kenneth  Sloan  was  on  the  use  of  color 
and  a  world  model  in  the  recognition  of  natural  outdoor  scenes.  He  used 
color  pictures  of  wilderness  scenes  roughly  oriented  towards  the 
horizon.  His  objective  was  to  identify  regions  such  as  sky,  mountains, 
hills,  trees,  lakes,  and  snow  patches.  The  first  step  in  his  program 
was  to  segment  the  digitized  color  picture  into  regions  of  homogenous 
color.  The  regions  of  homogenous  color  were  then  given  semantic  labels 
such  as  "sky"  and  merged  with  nearby  regions  under  control  of  an 
extended  production  system.  The  productions  used  contextual  and  color 
cues1  to  assign  semantic  labels  and  merge  regions.  The  initial  goal  of 
the  production  system  was  to  identify  the  skyline  as  a  context  for 
further  processing. 
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The  opponent  process  theory  of  color  perception  [Hurvich70] 
[Sloan75]  was  used  to  provide  color  descriptions  compatible  with  human 
perception.  The  resulting  color  descriptions  correspond  to  color  names 
in  human  language,  thereby  facilitating  extension  of  the  system  to  man- 
machine  interaction  through  natural  language.  The  opponent  process 
theory  can  explain  phenomena  of  human  color  perception  such  as  color 
desaturation  at  low  and  high  intensity,  and  the  effect  of  surrounding 
regions  on  color  perception.  However  these  phenomena  rely  upon  the  non¬ 
linear  processing  of  the  initial  red/blue/green  description.  For 
reasons  of  computational  speed,  Sloan  implemented  the  opponent  process 
theory  as  a  simple  linear  transformation  from  the  initial  red/green/blue 
color  space  into  a  ye  1 low-blue/red-green/white-black  color  space.  Thus 
the  only  advantage  of  his  implemented  opponent  process  theory  was  in 
color  description,  which  was  acheived  by  quantizing  each  of  the  three 
transformed  color  dimensions  into  +,  -,  and  0.  This  yielded  27 
different  colors  that  correspond  well  to  color  names  in  human  language. 

The  initial  region  segmentation  was  done  by  a  FORTRAN  program  which 
took  a  seed  for  each  region  and  recursively  found  all  pixels  with  8- 
neighbor  connectivity  that  were  the  same  color  as  the  inital  seed  to 
within  a  small  tolerance.  Because  of  the  8-neighbor  connectivity, 
regions  could  be  very  interlocked,  as  in  a  jig  saw  puzzle.  Regions  were 
described  by  area,  color  of  initial  seed,  bounding  rectangle,  shape,  and 
texture.  Shape  and  texture  were  optional  descriptions  computed  at  the 
request  of  higher-level  routines.  Shape  v.as  described  by  the  distance 
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from  the  center  of  the  bounding  rectangle  to  the  furthest  pixel  in  each 
of  eight  directions.  Texture  was  described  by  the  "busyness"  of  an  area 
as  indicated  by  the  region  boundaries. 

High-level  processing  was  under  the  control  of  an  extended 
production  system,  which  used  the  contents  of  Short-Term  Memory  to  focus 
attention,  storing  assertions  not  in  current  focus  under  tags  in  Long- 
Term  Memory.  The  initial  goal  was  to  process  the  regions  in  order  of 
decreasing  size  until  some  region  could  be  identified  as  either  land  or 
sky.  When  this  was  done,  the  production  system  was  recursively  called 
with  a  new  set  of  rules  for  the  context  established  by  the 
identification  of  the  initial  region.  Regions  near  the  initial  region 
were  examined  to  see  if  they  could  be  merged  into  the  initial  region,  or 
used  to  establish  a  new  context.  For  example,  if  a  dark  region  was 
found  while  examining  neighbors  for  a  region  identified  as  a  patch  of 
sky,  then  the  dark  region  would  be  used  to  establish  a  context  for  land 
and  skyline. 

The  semantic  content  of  the  rules  used  for  identification  and 
merging  of  regions  were  fairly  simple.  For  example  the  initial  sky 
region  was  identified  by  being  near  the  top  of  the  picture  and  bright. 
Tree  trunks  were  identified  by  being  brown  and  vertically  elongated.  A 
nearby  region  of  similar  texture  and  color  was  merged  into  a 
semantically  identified  region.  If  one  region  was  above  and  bluer  than 
an  identified  region,  then  it  was  judged  to  be  further  away  because  of 
the  blue-shift  for  distant  objects.  Similar  use  was  made  of  texture 


discontinuties .  Dark  blue  regions  were  identified  as  water,  while  dark 
blue-green  regions  are  identified  as  mountains.  White  regions  near  a 
ground  patch  were  identified  as  snow. 

Dr.  Sloan’s  system  did  not  identify  the  complete  skylir?  in  his 
pictures.  Rather  it  identified  fragments  of  the  skyline  and  depth 
discontinuities  in  the  land,  such  as  ridgelines,  which  he  called 
generalized  skylines.  He  used  three  pictures  for  the  development  of  his 
system,  and  3  pictures  to  test  the  system.  His  system  failed  on  the 
picture  of  Mount  Rainier,  partially  because  the  glacier  was  confused 
with  a  bank  of  clouds  and  partially  because  the  initial  region 
segmentation  included  regions  consisting  of  both  sky  and  ice.  In 
another  picture,  of  the  desolation  wilderness,  the  sky  is  too  dark  to  be 
identified  as  such,  but  the  mountain  peaks  are  identified  as  depth 
discontinuties,  i.e.,  as  part  of  the  generalized  skyline.  Other  errors 
include  such  things  as  confusing  mountain  peaks  with  tree  tops  and  poor 
identification  of  bodies  of  water. 
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Ill  CURRENT  SYSTEM 


The  current  approach  is  to  identify  initial  sky  and  land  "seeds", 
that  is,  regions  in  the  scene  that  can  be  confidently  identified.  The 
sky  seed  is  classified  as  being  clear,  overcast,  or  cloudy.  This 
information  is  then  used  to  extend  the  initial  seeds  when  it  can  be  done 
with  high  confidence.  The  final  result  is  a  picture  broken  into  a  sky 
region,  a  land  region,  and  a  possibly  empty  unknown  region. 

The  picture  is  assumed  to  be  right  side  up  with  the  sky  dominating 
the  top  of  the  picture  and  the  land  filling  the  entire  last  row  in  the 
bottom  of  the  picture.  The  current  system  will  almost  always  find  the 
skyline  under  conditions  of  clear  sky  or  under  cloudy  conditions  when 
the  clouds  do  not  touch  the  skyline  nor  form  an  unbroken  chain  across 
the  sky.  This  performance  is  achieved  because,  for  any  reasonably  sized 
picture,  the  luminosity  gradient  within  the  sky  is  always  less  than  the 
smallest  integer-valued  threshold  possible  for  region  segmentation. 
Thus  the  initial  sky  seed  is  in  fact  the  entire  sky  under  clear 
conditions.  In  conditions  of  an  overcast  sky  or  when  cumulus  clouds 
touch  the  horizon,  the  system  usually  finds  a  bounded  region  in  which 
the  skyline  will  appear. 

The  initial  region  segmentation  is  done  by  setting  a  gray-level 
contrast  threshold  and  then  grouping  pixels  into  4-neighbor  connected 
regions  enclosed  by  a  boundary.  The  contrast  across  the  boundary 
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everywhere  exceeds  the  gray-level  contrast  threshold.  This  segmentation 
is  accomplished  through  a  one  pass  scan  over  the  entire  picture.  A 
region  map  is  produced,  isomorphic  to  the  original  picture,  with  each 
pixel  assigned  a  region  number.  Adjacent  pixels  are  given  the  same 
region  number  if  their  contrast  is  less  than  the  threshold  contrast; 
otherwise  they  are  assigned  different  region  numbers.  During  the  scan 
adjacent  pixels  might  be  assigned  to  different  regions  based  upon  a 
local  boundary  that  does  not  become  a  closed  contour.  In  this  case  the 
two  regions  are  merged;  otherwise  the  algorithm  is  completely 
straightforward . 

The  basic  concept  behind  this  method  of  region  segmentation  is  to 
combine  region-based  and  edge-based  descriptions  since  edges  generally 
do  not  form  closed  contours.  An  alternative  to  the  approach  we  employed 
is  to  use  some  edge  operator  that  implicitly  computes  edge  data  and 
region  data.  The  Laplacian  operator  has  been  used  to  derive  edge  data 
from  zero-crossing  contours,  it  can  also  be  viewed  as  breaking  a  picture 
up  into  positive  and  negative  regions.  However  it  was  empirically 
determined  that  the  zero-crossing  contour  correponding  to  the  skyline 
was  almost  inevitably  broken  at  several  points,  causing  the  sky  region 
to  bleed  into  the  land  region.  These  breaks  were  usually  associated 
with  high-contrast  objects  near  the  skyline,  and  could  be  quite  large  in 
extent.  The  larger  the  DOG  (difference  of  Gaussian)  operator,  the  more 
likely  that  large  objects  near  the  skyline  would  fall  within  the  same 
window  as  the  skyline,  causing  a  break  in  the  skyline  contour.  Smaller 


DOG  operators  were  extremely  noisy,  with  many  small  contours  appearing 
within  the  sky  in  addition  to  the  skyline. 

Finding  region  boundaries  using  gray-level  difference  thresholds 
corresponds  to  edge  detection  using  gradients.  Two  problems  arose,  the 
first  being  noise  causing  locally  high  gradients  and  the  second  being 
large-scale  changes  escaping  detection  because  of  a  low  gradient.  A 
somewhat  less  frequent  problem  was  noise  acting  as  a  stepping  stone 
between  regions  and  causing  them  to  merge.  These  problems  had  a  uniform 
solution:  reduce  the  picture  by  averaging  each  NxN  square  of  pixels  into 
one  pixel  on  a  coarser  grid.  This  cancels  high-frequency  noise  because 
the  averaging  acts  as  a  low-pass  filter.  The  reduction  in  size  causes 
large-scale  changes  to  be  reduced  to  changes  on  the  scale  of  pixel 
neighbors.  Very  large-scale  changes,  such  as  the  gradual  change  of 
luminosity  over  the  sky,  remain  undetected.  This  is  an  advantage  over 
region-segmentation  techniques  that  look  for  regions  of  constant 
intensity  (or  constant  color),  given  that  the  sky  becomes  whiter  towards 
the  horizon.  An  added  advantage  of  reducing  the  picture  is  that  the 
computation  time  is  reduced  by  a  factor  of  NxN. 

A  4-by-4  picture  reduction  was  choosen  arbitrarily.  Empirical 
investigation  showed  that  the  region-segmentation  technique  was  fairly 
robust  over  a  wide  range  of  reduction  factors.  The  constraints  are  that 
the  reduction  average  a  sufficient  number  of  pixels  to  cancel  out  noise 
and  sharpen  a  hazy  skyline. 
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The  skyline-detection  algori  hm  uses  a  hierarchy  of  region 


segmentations.  The  hierarchy  is  formed  by  using  five  different  gray- 
level  difference  thresholds  for  segmentation:  2,  4,  8,  16,  32.  Any 
region  found  using  a  threshold  of  2  is  necessarily  a  subregion  of  a 
region  found  using  a  threshold  of  4.  In  general,  if  T1  and  T2  are 
thresholds  such  that  T1  <  T2,  then  regions  segmented  using  T1  are 
subregions  of  regions  segmented  using  T2.  A  threshold  of  2  is  the 
lowest  threshold  that  does  not  require  regions  to  be  of  constant 
intensity.  The  doubling  of  thresholds  is  an  arbitrary  choice,  which 
provides  coverage  of  edge  strenghths  from  2  to  N  with  only  log  (2)  N 
segmentations.  It  was  empirically  determined  that  an  upper  threshold  of 
32  was  sufficient  for  a  maximum  gray-level  value  of  255. 

The  initial  sky  seed  is  determined  with  the  region  segmentation  of 
threshold  2;  the  region  with  the  largest  number  of  pixels  in  the  top  8% 
of  a  picture  is  choosen  as  the  sky  seed.  The  assumption  being  made  is 
that  some  subregion  of  the  true  sky  is  larger  than  any  land  subregion  in 
the  top  8%  of  the  picture. 

The  initial  land  seed  is  found  using  the  assumption  that  the 
highest-contrast  continuous  boundary  spanning  the  entire  horizontal 
extent  of  the  picture  is  at  or  below  the  skyline.  This  assumption  had 
only  one  minor  exception  in  a  corpus  of  15  pictures  used  for  system 
development.  A  bright  cumulus  cloud  on  the  horizon  formed  a  higher- 


contrast  boundary  with  the  clear  sky  above  it  than  with  the  bright  rock 
below  it  in  the  picture  labeled  "Bishop".  The  method  used  to  find  the 


initial  land  seed  using  this  assumption  was  to  grow  the  initial  sky  seed 
until  it  touched  the  bottom  of  the  picture.  The  growth  was  accomplished 
by  using  successively  higher  thresholds  for  finding  region  boundaries. 
Since  the  sky  is  assumed  not  to  touch  the  bottom  of  the  picture,  it  is 
necessary  to  back  off  the  threshold  once  the  sky  region  touches  the 
bottom  of  the  picture,  this  is  done  by  halving  the  threshold  by  a  factor 
of  2.  The  initial  land  seed  is  found  by  finding  the  transitive  closure 
of  all  regions  connected  to  the  bottom  of  the  picture  through 
neighboring  nonsky  regions. 

The  initial  sky  seed  is  classified  as  clear,  cloudy,  or  overcast 
according  to  the  following  criteria.  If  large  regions  are  enclosed 
within  the  initial  sky  region,  the  sky  is  assumed  to  be  cloudy.  These 
large  regions  usually  correspond  to  bright  cumulus  clouds,  which  form 
high-contrast  boundaries.  Sometimes  the  initial  sky  seed  will  be  a 
cloud,  in  which  case  the  interior  regions  (holes)  would  be  patches  of 
clear  sky.  A  clear  sky  is  approximated  well  by  a  linear  intensity 
function  over  small  solid  angles.  Except  for  pictures  taken  with 
extremely  wide-angle  lenses,  the  assumption  of  a  small  solid  angle  is 
valid.  Theoretically  the  picture  should  also  lie  entirely  within  one 
quadrant  of  the  sky  with  respect  to  the  position  of  the  sun,  since  there 
is  usually  an  inflection  point  in  the  luminosity  of  the  sky  at  the 
azimuth  and  elevation  of  the  sun.  In  practice  the  latter  criteria  does 
not  make  a  difference,  so  the  intensity  of  the  clear  sky  is  well  modeled 
as  a  linear  function  of  the  x  and  y  coordinates.  An  overcast  sky  is 
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illuminated  by  sunlight  transmitted  through  layers  of  clouds.  The 
optical  thickness  of  the  clouds  determines  the  luminosity.  The  optical 
thickness  is  usually  not  well  approximated  by  a  linear  function  over  the 
viewing  angle  of  the  picture.  The  initial  sky  seed  is  classified  as 
clear  if  the  mean  sqaure  error  of  the  least  squares  linear  approximation 
is  less  than  the  mean  pixel  noise.  The  pixel  noise  was  computed 
assuming  that  the  gray-level  differences  between  adjacent  pixels  in  the 
picture  were  predominantly  due  to  noise.  A  histogram  was  made  of 
adjacent  pixel  gray-level  differences,  and  the  66%  percentile  was 
assumed  to  represent  the  standard  deviation  in  an  underlying  gaussian 
distribution.  This  gives  a  good  upper  bound  when  noise  is  due  to  the 
digitization  process. 


IV  RESULTS 


The  skyline  delineation  algorithm  was  tested  on  a  corpus  of  15 
pictures.  These  pictures  were  deliberately  selected  at  the  beginning  of 
the  project  to  present  difficulties  to  any  mechanical  system.  The 
corpus  can  be  classified  as  follows: 


Clear  Sky  :2 
Haze  :  2 
Cloudy  :  6 
Overcast  :  5 
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A.  Confusing  Factors 

In  four  pictures  there  is  contrast  reversal  at  the  horizon,  that  is 
objects  on  the  horizon  are  brighter  than  the  sky  above  them.  In  seven 
of  the  pictures  there  are  portions  of  the  sky  with  lower  intensity 
values  than  some  objects  on  the  horizon.  In  five  of  the  pictures  some 
boundaries  within  the  sky  form  a  higher  contrast  than  the  skyline 
boundary . 

B.  F£Ef2E?3!}£?  of  Skyline  Delineation  Algorithm 

Clear  Sky:  In  these  cases  the  program  found  the  skyline  with  no 
ambiguity.  The  region  segmentation  is  well  matched  to  the  gradual 
change  of  intensity  over  the  clear  sky,  and  the  brightening  of  the  sky 
at  the  horizon  increases  the  skyline  contrast,  usually  making  it  the 
highest-contrast  boundary  in  the  picture. 

Haze:  In  one  case  the  initial  sky  seed  found  in  the  picture  was 
identical  to  the  sky,  while  about  half  the  land  was  declared  ambiguous. 
In  the  other  case  the  program  mistakenly  identified  the  near  skyline  as 
the  real  skyline.  The  picture  was  taken  from  the  top  of  a  mountain  with 
Vancouver  Bay  in  the  background  of  a  ridge  line.  The  bay,  and  the 
peninsula  beyond,  are  so  hazy  that  the  program  misidentif ies  them  as  a 
cloud  (as  do  most  humans) .  In  many  hazy  pictures  the  horizon  is  so 
indistinct  that  the  best  any  interpreter  could  do,  mechanical  or  human, 
would  be  to  identify  some  region  of  ambiguity  in  which  the  skyline  must 
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In  none  of  these  pictues  is  the  skyline  found 
unambiguously.  In  three  of  these  six  pictures  the  sky  portion  and  the 
land  portion  of  the  picture  touch  at  some  points.  In  one  of  these 
pictures  a  small  error  is  made  where  a  cumulus  cloud  is  included  as  part 
of  the  land  seed.  The  cumulus  cloud  has  a  very  similar  intensity  to  the 
bright  rock  beneath  it,  as  opposed  to  the  darker  sky  above  it.  In  one 
of  these  pictures  a  cloud/snow  boundary  has  no  contrast. 

Overcast:  In  one  of  the  overcast  pictures,  the  skyline  was 
identified  without  ambiguity.  In  the  remaining  four  pictures,  the 
ambiguous  region  extended  over  various  portions  of  the  picture.  With  an 
overcast  picture  the  land  seed  is  apt  to  cover  most  of  the  actual  land 
in  the  picture,  while  the  sky  is  broken  at  horizontal  layers.  Thus,  in 
most  of  the  pictures,  the  ambiguous  portion  is  largely  within  the  sky 
region 


V  CONCLUSIONS 

The  principal  difficulty  in  image  understanding  is  to  translate  a 
given  interpretation  problem  into  one  of  identifying  observable  features 
of  an  image.  This  requires  an  analysis  of  the  underlying  physical 
mechanisms  governing  the  visual  properties  of  the  objects  of  interest 
and  the  image-acquisition  process.  The  analysis  yields  constraints  for 
translating  from  observable  features  back  into  semantic  objects.  When 
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the  problem  is  completely  constrained  the  translation  is  unambiguous, 
otherwise  ambiguity  remains. 

This  analysis  was  carried  out  for  the  problem  of  differentiating 
sky  from  land.  The  analysis  principally  focused  on  the  question  of  how 
to  extend  a  point  identified  as  sky  into  as  large  a  portion  of  the  true 
sky  without  misidentifying  some  portion  of  land  as  part  of  the  sky.  The 
principal  observable  due  to  physical  mechanisms  of  sky  luminance  is 
gradual  change.  Unlike  other  problems  where  geometric  structure  is 
transformed  into  observables  through  the  imaging  process,  the  sky  has  no 
geometric  structure  and  there  is  no  fixed  model  of  the  geometric 
structure  in  the  land.  The  geometric  structure  which  is  assumed  is  that 
the  picture  is  taken  right  side  up  with  the  sky  dominating  the  top  of 
the  picture  and  not  touching  the  bottom  of  the  picture.  This 
presuppostion  is  equivalent  to  requiring  the  question  of  skyline 
delineation  to  be  me-  gful  for  the  images  analyzed. 

An  algorithm  (a  region-based  segmentation  process  that  splits 
regions  at  discontinuities)  based  upon  these  constraints  was  constructed 
and  tested.  The  algorithm  incorporates  sufficient  constraints  to  find 
the  skyline  unambiguously  in  easy  cases,  while  delineating  an  ambiguous 
region  in  more  difficult  cases.  The  only  major  mistakes  it  made  were  in 
cases  of  extreme  haze,  where  humans  would  also  make  mistakes. 


•  1 


'•I 


•  1 


20 


REFERENCES 


[DaveSO]  Dave,  V.J.  (September  1980)  "Atmospheric  Modeling  -  Transfer  of 
Visible  Radiation  in  the  Atmosphere",  Palo  Alto  Scientific  Center, 
IBM  Report  No.  6320-3411.  To  be  published  in  Atmospheric 
Env iorment . 


[HurvichTO]  Hurvich,  Leo  M. ,  and  Jameson,  Dorothea.  "Color  Vision  and 
color  Coding",  in  PERCEPTION  AND  ITS  DISORDERS  Res.  Publ. 
A.R.N.M.D.  Vol .  XLVIII,  The  Association  for  Research  in  Nervous  and 
Mental  Disease. 


[Sloan75]  Sloan,  Kenneth  K.  and  Bajcsy,  Puzena.  "A  Computational 
Structure  for  Color  Perception",  ACM  1975  Proceedings,  pp  42-45. 


[Sloan77]  Sloan,  Kenneth.  "WORLD  MODEL  DRIVEN  RECOGNITION  of  NATURAL 
SCENES",  Ph'^  Thesis,  1977,  University  of  Pennsylvania,  The  Moore 
School  of  Electrical  Engineering,  Philadelphia,  Pennsylvania  19104. 


21 


FIGURE  3  THRESHOLD  IS  DOUBLED  UNTIL 
REGION  CONTAINING  SKY  SEED 
ALMOST  TOUCHES  "BOTTOM" 

OF  THE  PICTURE.  THE  LAND 
SEED  IS  THE  COMPLEMENT. 


FIGURE  4  PICTURE  SEGMENTED  INTO  SKY 
SEED.  UNCLASSIFIED  PORTION, 
AND  LAND  SEED 
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